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Abstract 

A new method is presented which allows time averaged density matrices of closed quantum systems to be computed via a 
constraint overlap maximization. Due to its simplicity, this method can be combined with algorithms based on tensor networks, 
as, e.g., matrix product operators (MPO). An algorithm is explained and several results for non-integrable Ising chains are 
given. Among them are scaling examples, time averaged expectation values, their variances and operator space entanglement 
entropies. 
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I. INTRODUCTION 

An isolated classical system approaches its thermal 
equilibrium by maximizing its entropy. A closed quan¬ 
tum system on the other hand can not thermalize in this 
way, since it is subjected to a unitary time evolution, 
which does not change the entropy. So, by which mecha¬ 
nism do closed quantum systems equilibrate, if they equi¬ 
librate at all? This question was already investigated 
1929 by John von Neumann in the early days of quan¬ 
tum mechanics [T], see also the comments in Ref. [2]. 
While at that time, the thermalization of a pure quan¬ 
tum state might have seem solely as an academic ques¬ 
tion, the interest in this subject has recently rekindled 
with the advent of new experimental techniques which 
allow to study almost undisturbed long-time evolution of 
ultracold atoms and trapped ions Eill]. 

To start with, we like to remark that even in classical 
physics, the definition of thermal equilibrium is far from 
trivial if we look at the deterministic evolution of a spe¬ 
cific microstate. Therefore, one resorts to macrostates, 
which implies an averaging over a vast number of mi¬ 
crostates. In statistical mechanics, an isolated system 
with known total energy is described by the a micro- 
canonical ensemble, which assigns to each microstate 
with suitable energy the same probability. For ergodic 
systems, the microcanonical ensemble coincides with the 
long-time average of the system. These two definitions 
can also be used for closed quantum systems. But here, 
the density matrices obtained from the microcanonical 
ensemble Qm.c. and the long-time average 



do generally not coincide ^ . Both density matrices gm.c. 
and Q are diagonal in the energy eigenstates basis. Yet, 
the diagonal elements of Pm.c. consist only of zeros and 
ones (disregarding normalization), while the diagonal el- 
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ements of g depend on the initial state (see also Eq. (|^, 
below). 

Instead of considering the entire system, one can also 
focus on a small subsystem. Then, the (much larger) rest 
of the system can be seen as thermal bath of the sub¬ 
system. Interestingly, in this case thermalization might 
even be obtained without any ensemble or time averag¬ 
ing. Due to entanglement, the reduced density matrix of 
a pure state is a mixed state and therefore it is possible 
that after sufficiently long time, the reduced density ma¬ 
trix of a subsystem describes a thermal state. Various 
publications have shown that this is indeed the case for 
many systems, see e.g. Ref. isHia, but also Ref. [13 [U 
as counter examples. 

One of the leading theories to explain thermalization is 
the so called eigenstate thermalization hypothesis (ETH) 
BM- Here, it is assumed that for energy eigenstates, the 
reduced density matrices of local subsystems are thermal. 
That is, an initially out of equilibrium state relaxes due 
to a dephasing of the different energy eigenstates such 
that the coherences average out. More background infor¬ 
mation can also be found in the reviews HMS- 

In the context of thermalization, it is important to dis¬ 
tinguish between integrable and non-integrable systems. 
While for non-integrable systems it is widely assumed 
that their reduced density matrices relax to a standard 
Gibbs ensemble, this is generally not possible for inte¬ 
grable systems, where an extensive number of local in¬ 
tegrals of motions Ij conserves a memory of the initial 
state. In such a case, the system can still equilibrate to 
a generalized Gibbs ensemble |7] . 

However, integrable systems have the undoubted ad¬ 
vantage that closed analytical solution might be found 
|18I[BI| - while the numerically accessible timespan of non¬ 
integral systems might not suffice to observe equilibration 
HHIO]. The situation might even be worse, if the system 
is non-integrable but has quasi-conserved local integrals 
of motions, which can result in very long relaxation times 

En¬ 
in any case, if a closed quantum system respectively its 
subsystems equilibrate, the equilibrium states must co¬ 
incide with the time averages of these states and hence, 
can all be obtained from the time averaged density ma- 
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trix (TADM) g defined in Eq. Q. In order to actually 
compute the TADM g, two methods come immediately 
into mind: One can take Eq. Q literally and calculate 
various time evolved states 'I'(^ and sum up their g{t). 
Alternatively, one can recall that the TADM g consists of 
the diagonal elements of the initial density matrix go ex¬ 
pressed in energy eigenstates \Ej). Hence, one can use di- 
agonalization techniques to find the relevant eigenstates 
of the Hamiltonian to reconstruct g. Unfortunately, for 
many systems of interest, both methods are of very lim¬ 
ited applicability. 

Here, we introduce an alternative strategy based on a 
simple constraint overlap optimization procedure. Due to 
its simple structure, the optimization can be easily car¬ 
ried out with matrix product operators (MPO) or other 
tensor network operators |22fE^ . This is demonstrated 
by various applications in Sec. |HI[ where among others 
expectation values, variances and operator space entan¬ 
glement entropies for non-integrable Ising spin chains are 
presented. Further, we compare spin chains of different 
lengths and also study the influence of the so-called bond 
dimension of the MPO on the obtainable results. 

The key idea of tensor networks as matrix product 
states (MPS) and MPO is to express high-dimensional 
quantum states and operators as products of low¬ 
dimensional matrices respectively tensors. Such an 
ansatz works fine as long as the system’s entanglement 
is limited. Unfortunately, this is not a situation we can 
expect to find in a generic TADM g. Hence, in many 
cases of interest, a tensor network can only represent an 
approximation of the TADM which is more or less rough. 
However, since our approach is based on an optimization 
principle, we can at least hope to get the best out of our 
limited resources within the chosen ansatz class. 


The quality of the solution also depends on the chosen 
type of tensor network. In this paper, we will mainly 
deal with MPO. That is, we aim directly for the density 
operator, while e.g. the methods based on time evolution 
or diagonalization primary calculate the states |5'(t)) or 
\Ej). In appendix M we will also give brief account how 


the algorithm can m used with MPS. 


Not surprisingly, the decision to calculate states or to 
aim directly for the operator entails certain advantages 
and disadvantages. For example, if the TADM is mainly 
described by one dominant energy eigenstate g « \E){E\, 
it is generally easier to compute this state \E) instead of 
the operator \E){E\. Here, the entropy of g « \E){E\ is 
close to zero. Then again, if the entropy of the TADM 
is high, i.e., if it is composed of many similarly weighted 
energy eigenstates g = J2jPjj\^j){^j\^ targeting the op¬ 
erator directly seems favorable, since a MPO can handle 
arbitrary amounts of entropy. For the same reason, MPO 
are better suited for the time average of a (local) mea¬ 
surement operator O — J2j Ojj\Ej){Ej\. 


A. Structure of this paper 


In writing this paper, we had two different types of 
readers in mind. On the one hand, the reader who likes 
to understand the basic ideas, but has no need for all 
algorithmic detail. On the other hand, the reader who 
likes to reproduce our algorithm and hence, needs all the 
details (s)he can get. The main paper should fit the first 
type of readers, while readers of the second type find 
all the information they need in a vast appendix, where 
various special topics are treated. 

The key insight of our method is that the search for the 
TADM can be phrased as simple optimization problem. 
This is explained in Sec. while a detailed presenta¬ 
tion of an algorithm solving the optimization problem 
is outsourced into the appendix: In appendix a gen¬ 
eral strategy for solving the optimization problem is ex¬ 
plained, yet without any references to tensor networks. 
The modifications needed to incorporate tensor networks 
are discussed in appendix [F| F urther improvements are 
presented in the appendices]^ |T] and 

For readers who do not intent to study the appendix. 
Sec. [IIP I provides a quick overview of the crucial ideas 
used for the numerical solution, but here, explanations 
are sparse. In Sec. |HI[ we present our numerical results. 
Finally, the main paper is concluded with a discussion 
and outlook in Sec. CYl 


II. FORMAL SOLUTIONS FOR THE TIME AV¬ 
ERAGED DENSITY MATRIX 


In this section, we look at the general structure of the 
time averaged density matrix (TADM) g and show in 
the following subsections how the calculation of g can be 
cast into the alternative form of a simple optimization 
problem. 

For the theoretical considerations, we always choose 
the energy eigenstates as preferred basis. In this basis, 
we express the initial density matrix g^ at time t = 0 as 

go = '^Pjk\Ej){Ek\ with Pjk = {Ej\go\Ek), (2) 


while at any other time t, the time evolved density matrix 
g{t) is given as 


g{t) = ^ exp 
j,k 




Pjk\Ej){Ek\. 


( 3 ) 


Inserting this notation into the definition of the TADM g 



( 4 ) 
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we obtain 


With that, we obtain for the commutator of any matrix 
M with the Hamiltonian H 


9 = 

j,k 

= '^^Ej,EkPjk\Ej){Ek\, (5) 

where we used the symbol 5Ej,Ek abbreviation for 





Ek)t dt 


for Ej = Ek 
for Ej ^ Ek ' 


( 6 ) 


For a non-degenerate energy spectrum, the TADM g con¬ 
sists of the diagonal elements of qq. In case of a degener¬ 
ate energy spectrum, g is made up of the corresponding 
block diagonal elements of go. To keep the notation sim¬ 
ple, we will still refer to these elements as ^pdiagj i-S- 

l?diag = 9 = Sej ,EkPjk\Ej) {Ek\ 

3,k 

= E P3k\E,){Ek\. (7) 

Ej —Ek 

Correspondingly, we define ^?off-diag as 

Poff-diag = E! (^ ~ ^Ei,Ek) Pjk\Ej){Ek\ 
j,k 

— E! Pjk\Ej){Ek\. (8) 

Ejj^Ek 


A. structure of the solution 

The idea we pursue to obtain the time averaged density 
matrix (TADM) g is to calculate (respectively approxi¬ 
mate) ^?off-diag (|^ and subtract it from g^ 

9 — ^?diag — ^?0 f?off-diag- (9) 

To this end, we express £>off-diag as commutator of the 
Hamiltonian and an unknown matrix M, which still has 
to be determined, i.e. 

Poff-diag = (10) 

The purpose of the commutator will become clear in 
the following. In a first step, we write the matrix 
^ = lljk'^3k\Ej){Ek\ as 

M = E mjk\Ej){Ek\ + E rnjk\Ej){Ek\. (11) 

Ej=Ek EjjiEk 

' -^ ^-V-^ 

— .^diag —^^ofT-diag 


[H, M] = [H, Mdiag] +[H, Moff_diag], (12) 

=0 


where 


[Ff,Mofi_diag] = E {Ej - Ek)mjk\Ej){Ek\ (13) 

Ej^Ek 

[H,Mdiag] = 0. (14) 

Obviously, [H,M] = [Ff, MoS-diag] is always an off- 
diagonal matrix for any matrix M . Further, Eq. ( 13 \ 
can be inverted to solve goff-diag = [H, M] ( 101, since the 
term Ej — Ek never becomes zero for Ej y^Ek- That is. 


Vpoff-diag; 3AF with f?off-diag — [-^ 5 -^]- 


(15) 


In this equation, only the off-diagonal part AFoff_diag of 
the matrix M is unique, while the diagonal part AFdi ag is 
arbitrary due to [Ff, Mdiag] = 0 (14). Inserting Eq. (15) 
into Eq. , we find 

ygo, 3M with g=go — [E[,M]. (16) 


B. Commutator operator £ 


Before we explain the advantage of expressing the 
TADM as g = go — [H,M] (16), it will be convenient 
to introduce the superoperator £ = [H, .. .] which acts 
on a matrix A as 


€A := [iF, A] . 


(17) 


Formally, £ can be seen as a tensor of fourth order 
([F7, A])*”^ = £*™-Al^. In other contexts, the superopera¬ 
tor £ is often called Liouvillian. Unfortunately, the term 
Liouvillian is not unique and also understood in other 
ways. Therefore, we will simply refer to £ as “commuta¬ 
tor operator”, which should be free of any ambiguity. 

In the following, we will not distinguish between op¬ 
erator and superoperator and also vectorize matrices as 
M writing \M). Here, we take advantage of the Choi- 
Jamiolkowski isomorphism Mjk ■ \j){k\ Mjk ■ \j) 0 |fc)- 
For the Hilbert-Schmidt inner product (which we use 
throughout this paper) of matrices A, B, we find that the 
commutator operator £ behaves self-adjoint (£A|F3) = 
(A|£H), despite the anti-hermiticity (£A)^ = —£A for 
Hermitian matrices A^ = A. Hence, we can use notations 
like e.g. || [FF, M]\\ ^ = (M|£^|M) . Since we will need this 
property quite often, we derive it explicitly 

(£A|H) := tr((FFA-AiF)1'H) 

= tr(^A^HB - HA^B 
= tr(^A^HB- A^BH 
= tr (At [iF,H]) 

= (A|£H). (18) 
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C. Optimization problem 


In this subsection, we show how the TADM q can be 
solved as optimization problem. We start with Eq. (101, 
which expresses the off-diagonal elements of Qq as com¬ 
mutator 


CM, (19) 

with a yet unknown matrix M. Formally, this can be 
solved as 


M = £ ^£»off-diag- (20) 

In appendix we have a closer look at this strategy 
and also comment on problems arising due to quasi¬ 
degenerate eigenstates, but we will not use these find¬ 
ings in the rest of the paper. Here, we follow a different 
approach. 

In a first step, we note that the TADM g = gidiag has 
a vanishing overlap with the commutator CM for any 
matrix M with finite norm, since CM is a purely off- 
diagonal matrix (in energy eigenstates) 


(pdiaglCM) ® (Cgdi^ \M) = 0. (21) 

=0 

So far, neither pdiag nor Poff-diag nor M are known objects. 
What we know is qq. Hence, let us look at the overlap of 
Po with the commutator CM 


(eolCM) 


(Pdiag T £^off-diag|^Af) 



+ (^off-diaglCAf) 


(Poff-diaglCAf). 


( 22 ) 


This simple result is one of the cornerstones for the time 
averaged density matrix algorithms we are going to derive 
in this paper: For any matrix M, the inner product of 
the commutator CM with the unknown matrix Poff-diag 
equals the inner product with the known matrix go- 
With the identity (gol^A/) = {goS-dingl^M) (221, we 
also find that the matrices M which maximize the two 
inner products are the same 


arg max 

||CM||2 = 1 




= arg max 

IlCMlp^l 


^off-diag 



(23) 


where we use 


||CM|| 2 = (CM|CM) = 1 (24) 


as normalization condition and not (M|M) = 1. Eq. 
is maximized for 


(23) 


^Af — goff-diag; 


(25) 


with c = llgoff-diagll = (£Af|goff-diag) = (CM|go)- Actu¬ 
ally, we also have to ensure the existence of matrices M 
which satisfy Eq. (25). But this we have already done in 
Eq. 

Putting all together, we find that any matrix M which 
maximizes the inner product (go|21M) under the condi¬ 
tion ||CMfy = 1 


M = arg max ( {go\€M') 1 (26) 

||£M'||2 = 1^ ^ 

also satisfies 

£^off-diag 

go-{^M\go) CM 
go - CM, (27) 

with M = (CM|go)Af. That is, the TADM g can be 
solved as an optimization problem, which only involves 
basic matrix operations as addition, multiplication and 
inner product. 



1. Alternative optimization ansatz 


It is relative straight forward to see that the condi¬ 
tioned maximization of Eq. (26) is equivalent to the un¬ 
conditioned minimization of 

M = arg min^ljgo - CM'f) 

= arg min(X2>df§X-2R.e((go|C|M')) -k 

—const. 

(28) 


While Eq. (26) and Eq. (28) are equivalent, one might 


also find alternative approaches for M which would yield 
the same g for optimal M, but result in qualitatively 
different approximations gapprox for imperfect M. In ap¬ 
pendix]^ we discuss such an approach given by 

M = arg min(^||C(go - CM')f y (29) 

This method minimizes the residual time dependence 
of gapproxi while for most other physical properties, the 
standard method described by Eqs. (26) and (28) seems 


more promising; see appendix B 1 


2. General eigenvector problem 


Instead of maximizing (go|CM) as in Eq. (26), one can 
also maximize (CMjgo) (go|CM). The advantage of this 
bilinear form is that the search for an optimal M can now 
be phrased as 


^(CM'|go)(go|CM') 

M = arg max -;-^^- 


(30) 
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which can be solved as a general eigenvector problem for 
the maximal eigenvalue A 

{G:\go){go\G:)\M) = X-€G:\M) (31) 

Unfortunately, both sides of the eigenvector equation can 
become zero at the same time for ||M|| > 0, which is a 
notorious source of trouble for the numerical treatment of 
generalized eigenvector problems. Therefore, we do not 
consider this approach as ideal and present an alternative 
strategy in the appendices |E| and [F| Still, the reader who 
has already a good and stable software solution for this 
problem at his or her disposal might give it a try, anyway. 


D. Solving the optimization problem 


In the following, we just give a short (an hence in¬ 
complete) overview of the method used for solving the 
optimization problem. Further information and omitted 
explanations can be found in the appendices ^ ^ and 
beyond, where an in-detail description is provided. 

We need to find a matrix M (271 , with gioff-diag = CM 


(19l. To this end, the matrix Mis expressed as a linear 
combination 


M = jMj, (32) 

3 

i.e., Poff-diag = For the matrices Ml, we de¬ 

mand (GlAljjCAlfc) = which allows us to obtain the 
optimal coefficients aj as 

Uj = (CMljjfiioff-diag) ® {€Mj\go). (33) 

We like to find matrices M.j with high aj, respectively 
with high overlap (dAf^jeo)- A suitable way is to gener¬ 
ate the matrices M.j iteratively as elements of a Krylov 
subspace K, 

1C = span {££< 0 , <^^£> 0 , Qo, • ■ ■, • (34) 

But in this Krylov subspace approach, we still ignore the 
fact that the matrices Aij exhibit the same exponential 
scaling with the system size as g^ itself, which generally 
foils an explicit calculation of these matrices. 

To master this exponential scaling, we resort to a ten¬ 
sor network representation |22fl25| . i.e., we use MPO (ap¬ 
pendix [F^ and double MPS, a tensor network explained 
in appendix The basic idea of a tensor network is to 
express (or approximate) a high-dimensional object M 
as a product of low-dimensional tensors M 

M = l[M[k]. (35) 

k 

This is a short hand notation, where we omitted the in¬ 
dices used for the multiplications of the tensors 
see also appendix |F 1[ The dimension of these indices 


is commonly referred to as bond dimension and has to 
be limited for a successful numerical handling. This also 
imposes limitations on the maximal amount of entangle¬ 
ment which can be represented faithfully. 

In case of a tensor network, we need an optimization 
procedure for the network tensors M[j.] (35l. Here, we 
can use the same idea as before and express each M[j,] as 
a linear combination of iteratively generated tensors M 


( 36 ) 

But it contrast to Eq. p4| , it is no longer advisable 
to generate the tensors as elements of a simple 

Krylov subspace. Here, a more elaborated iteration rule 
is needed ( [ll] ), which also takes information from pre¬ 
vious optimizations into account. This is explained in 
appendix |F3| and further improved in appendix^ 


III. RESULTS 

In this paper, we have presented a new numerical 
method, which naturally raises lots of questions concern¬ 
ing its performance. In case of highly entangled time av¬ 
eraged density matrices or operators, the probably most 
urgent question is how much insight we can really gain 
if the chosen tensor network ansatz only supports a lim¬ 
ited amount of entanglement. We strongly focus on this 
question comparing results for different bond dimensions 
U = 2" ranging from D = i to D = 512. Hereby, D 
always refers to the bond dimension used for the ansatz 
M in g = gQ — £M ( [2^ . Other interesting aspects as 
convergence properties and the achievable precision are 
addressed in appendix [N| 

As already mentioned in the introduction (Sec.|T]), inte- 
grable and non integrable systems are expected to ther- 
malize differently. Further, for integrable systems, the 
tensor network based simulation of time evolution can of¬ 
ten be done with less computational resources, i.e., with 
lower bond dimensions j26| . As an example for an inte¬ 
grable system, we look at the Ising Hamiltonian H of a 
spin chain of length L 

H = ( 37 ) 

i=i 

where cri and al denote the Pauli matrices applied to 
the jth spin. This Hamiltonian can be mapped onto a 
system of free fermions by a Jordan-Wigner transforma¬ 
tion. Also numerically, one quickly finds that the time 
average of a single operator (Heisenberg picture, see 
appendix can be described by a MPO with bond di¬ 
mension D ^ L + 2 and for the time average of the op¬ 
erator Sx = D — A is sufficient. For 

non-integrable Ising models on the other hand, such sim¬ 
plifications cannot be found. 
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For the rest of this paper, we consider the non- 
integrable Ising Hamiltonian H 


L-l 

H = -Y,a 

i=i 


^ (38) 


i=i 


72 


for which we compare spin chains of different length, L = 
13,25,51. As examples for time averaged operators, we 
look at the polarization of the central spin and 

the average polarization Sfieid in direction of the applied 
field 


_central _ 

Afield — 


5^field — 


“V2 ^ 


with c = [4] 


Cfx + CTz 


72 • L 


As states, we consider the two initial state 

0L 


|vI/+)=|+)®^=2-tQ 

1^^) = |0)®^ = 


0L 

oj ■ 


(39) 

(40) 

(41) 

(42) 


Further, we look at the ground state \Eo) of the Hamil¬ 
tonian (38) where either the central spin is flipped or the 
left and right outer spins together 

|4/central flip) — with 


^=rfi 


14-0 


■flip) = 


= ^(1) 


r(L) 


\Eo)- 


(43) 

(44) 


In the ground state, the spins are mostly in the “up” po¬ 
sition (p) such that we can e.g. expect much higher pre¬ 
cisions for l^-^) (|4^ than for |4'+) (|4^. For |4'outer flip), 
the precisions is even high enough to calculate reliable 

results for the variances of ((t 7^) with the help of the 
method explained in appendix [D| 


we mentioned an alternative optimizatio n an satz which 
minimizes the time derivative H^approxll (291 (and with 
that maximizes q), while the standard niethod used in 
the rest of the paper minimizes Upapproxli (281 without 
time derivative. This alternative optimization ansatz is 
discussed in more detail in appendix To distinguish 
these two methods, we denote them by T-|- and T— 


T+ method which minimizes ||Papprox|| 

T— standard method, which minimizes ||gapprox|| • 

(46) 

These two optimization methods are used in combination 
with two different tensor network ansatze: 1) MPO and 
2) double MPS. A double MPS is a MPS of twice the size 
of a regular MPS and acts as an operator, as is discussed 
in appendix The double MPS was chosen because 
only marginal adaptations are necessary to run the MPO 
algorithm with a double MPS. 


Fig. 0 shows a log-log plot of q ( [4^ in dependence of 
the bond dimension D for the initial states j^t^) = |0)®^® 
and |4'+) = |-|-)®^^. As can be seen, the double MPS 
allows to obtain better results than the MPO for these 
states. Not surprisingly, we also And that the T+ method 
performs better than the standard T— method, since the 
T + method was designed to generate the highest possible 
q values. 


2. Fidelity 

Nonetheless, theoretical considerations in appendix [B] 
suggest that the T— method should be better suited to 
compute physical quantities than the T-\- method. To 
verify this thesis, we studied a small and hence exactly 
solvable spin chain of length L = 13, which allows us to 
calculate the fidelity F 


A. Performance of the different methods 



(47) 


We start by comparing the performance of four differ¬ 
ent ways to calculate the TADM of the initial states l^t^) 
@ and 14-+) 


1. q value 


To estimate the quality of the approximated TADM 
Papprox without knowledge of the exact TADM g, we look 
at the residual time dependence of gapprox set it in 
relation to the time dependence of the initial state go 


l|[g,go]H _ JIgoll 

11 [H, gapprox] II II Papprox 


(45) 


That is, q is the factor by which the time dependence 
of Papprox was reduced compared to go- In Sec. HCl 


We remark that F is normally used in the context of pos¬ 
itive matrices only, while the numerically approximated 
TADM gapprox might have a few small negative eigenval¬ 
ues. Still, this has no essential influence on our line of 
argumentation. 

In Fig. the results for 1 — F are shown in a log-log 
plot in dependence of the bond dimension D for the initial 
states 14-^) = |0)®i3 and 14-+) = |-f)®i3, We still And 
that the double MPS performs better than the MPO and 
as expected, the standard T— method generates better 
results than the T+ method. 

As a consequence of these findings, we use the stan¬ 
dard T— method to compute the physical properties of a 
TADM g, while we employ the MPO based T+ method 
to compare the q values of different spin chains, as we do 
next. 
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Figure 1: Comparison of the q value (451 for different methods. The logarithm of the q value is plotted over the logarithm of 
the bond dimension D, for a) |'&+) (411 and b) IT-t-) (421. In both cases, the system size is L — 25. The different methods are 
explained in the main text (Sec. Ill A[|. For better visibility, the data points are slightly shifted - log 2 (H) is always an integer. 
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Figure 2: Comparison of the fidelity F for different methods. The logarithm of 1 — T is plotted over the logarithm of the 
bond dimension D, for a) m and b) \42\ . In both cases, the system size is L — 13. The different methods are 

explained in the main text (Sec. Ill Al. For better visibility, the data points are slightly shifted - log2(i7) is always an integer. 


B. Spin chains of different length 


C. Entanglement entropy 


In Fig. the q values (451 of the initial operators 


-central 

Afield 


(39), S'fieid (40) and the initial states 14'+) (41), 


141^) (42) are shown for spin chains of length L = 
13,25,51. Especially for the time average of the oper¬ 
ator (Fig. |a), the q value is mostly independent 

of the length of the spin chain. For S'geid and |4'+), this is 
roughly true, as well, while for |4l^), we see a pronounced 
difference. At least for this weak dependence on 

the length of the spin chain can be understood when we 
look at the operator space entanglement entropy of the 
time averages, what we do next. 


In the following five figures, we study the operator 
space entanglement entropy (OSEE) |27| in dependence 
of the position where the spin chain is split into two parts. 
Each plot consists of a family of curves, where each curve 
depicts the results obtained for one specific bond dimen¬ 
sion = 2"’ between D = A and 77 = 512. To emphasize 
the symmetry of the plots, the center of the spin chain 
is denoted as zero and the spin positions left from the 
center are addressed with negative numbers. 


















































Figure 3: Comparison of the q values (45 \ for different syst em lengths L = 1 3, 25, 51. The logarithm of the q value is plotted 
over the logarithm of the bond dimension D, for a) (39l, b) S'seid (40l, c) |4'+) (41l and d) |42| . In all four cases, 

the T+ method (461 was used combined with a MPO ansatz. For better visibility, the data points are slightly shifted - log 2 (Il) 
is always an integer. 


a) b) 



Figure 4: For the time averaged operator (391 and system sizes a) L = 25 and b) L = 51, the operator space entanglement 

entropy (OSEE) is plotted over the position of the bipartition. The OSEE is plotted for all bond dimensions D = 2’^ with 
fc = 2, 3,..., 9, whereby the OSEE is monotone increasing with D. To emphasize the symmetry, the center spin is denoted as 
position zero. 
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of the TADM gapprox —> ^?exact, it IS still a good indicator. 


Fig. 1^ shows the OSEE of (the time average of 


the operator (|39|)) for two spin chains of length 

L = 25 and L = 511 Next to the nearly equidistant 
scaling of the entropy curves with the bond dimension, 
we notice a striking resemblance between the plots for 
L = 25 and L = 51. The L = 51 appears like the trivial 
continuation of the L = 25 plot, where the OSEE is zero 
for all bipartions sufficiently far from the center. 

A closer inspection shows that the approximated 
acts as an identity operator on spins in the area 
of vanishing OSEE. This explains the findings in Fig.|^a) 
that the q value (451 is nearly independent of the system’s 
length. 

In the context of MPS approximations, it is quite com¬ 
mon that the limitation of the bond dimension induces 
exponentially decaying correlations. Usually, this can be 
understood by the mere observation that the amount of 
information a MPS can transmit is limited, while the 
amount of transmittable information increases exponen¬ 
tially and hence, has to be damped. In contrast, for the 
approximated most of the MPO’s capacity to 

transmit information appears widely unused. 


2. S'fleld 

Fig. in shows the OSEE of the approximated time av¬ 
eraged operator Sgeid (401 for two spin chains of length 
L = 25 and L = 51. fe'^for (Fig. ), a nearly 

equidistant scaling of the entropy curves with the bond 
dimension can be observed, but with a much smaller 
spacing and a distinct offset. Further, the entropy for 
L = 51 is lower than for L = 25. Interestingly, if one 
uses the equidistant scaling for a bold extrapolation to 
the maximally needed bond dimensions D = 2'^'^ respec¬ 
tively D = 2^°, one finds that the maximal value of the 
OSEE S is around S' « 2 for both system lengths, L = 25 
as well as L = 51. 


3. 14'+) and 


The OSEE of gapprox for the initial states |4'+) (41) 
and (42) is shown in Fig.[^(L = 51). The arguaBIy 
more interesting plot is the one for (Fig.j^b). Here, 
the entropy decreases with increasing bond aimension. 
This anomalous behavior might be a consequence of the 
unorthodox optimization ansatz g = go — (271. 


Besides the anomalous decrease of the OSEE with in¬ 
creasing bond dimension, we also notice a convergence 
of the OSEE. This convergence is even more distinct for 
smaller spin chains (not shown here), which is in accor¬ 
dance with the stronger dependence of the q value on the 
system size for j'l'^) (Fig. §d). Although the convergence 
of the OSEE does not necessarily imply the convergence 


I 4/outer flip) and I 4/central flip) 

The convergence of the OSEE is even more pronounced 
for the initial states |4/centrai flip) ( |43[ ) and |41outer flip) 
®l, as shown in Fig. For |4'outer flip) (Fig. |^a), the 
OSEE for the bond dimensions 128, 256 and 512 appear 
as one line and cannot be distinguished. 

Due to the excellent convergence of Papprox for 
|4'outer flip)) we also calculated the time average of the 
doubled system V 

U .— I'kouter flip)('kouter flip I O |4^outer flip)(4^outer flip I) 

(48) 

whose TADM V allows to compute variances cr^ (D2) for 
the time averaged expectation values of |4'outer flip(t))) 
as is explained in appendix jD] . The OSEE of the time 
averaged V is shown in Fig. |^and indicates a very good 
convergence, as well. 


D. Expectation values 

While the OSEE of gapprox indicates an excellent con¬ 
vergence for the initial state I'kouter flip)) the convergence 
for l^k^) is less clear and for |4'+), we see no convergence 
at all. Since this might be the more common situation, 
we look at the time averaged expectation values of |'I'+) 
and l^/^) first. As operator, we choose S’fleld (40), for 
which we have calculated the time average, as well. 

Since we have chosen a non-integrable Hamilto¬ 


nian (38), we do not know the correct results for large 
systems. The only indicators we can provide are the 
common convergence of three different methods (MPO 
and double MPS ansatz for g and the MPO ansatz for 
the time averaged operator Sfleid) and a comparison with 
a small, exactly solvable system of 13 sites. The results 
are shown in Fig. Here, the worst result is arguably 
the one for the 51 sites long |4'+) state (Fig. |^b). But 
taking the difficulty of the task into account, one might 
still find the results encouraging. 


1. Variances 

Finally, we come to the most precise results: The 
TADM for the initial state l^^outer flip) (44), which we em¬ 
ploy to determine the time averaged expectation values 


of the local Pauli matrices As already announced, 

we can take advantage of the techniques described in ap¬ 
pendix 1^ and use the time average V of the doubled 
system to compute the variances 


Var((CT('’))) = (^'outerflipkz |4'outerflip) 


(4Iouter flipl^^s |41outerflip) • (49) 
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a) b) 



Figure 5; For the time averaged operator Sfleid ( |39[ ) and system sizes a) L = 25 and b) L = 51, the operator space entanglement 
entropy (OSEE) is plotted over the position of the bipartition. The different curves belong to different bond dimensions D = 2^ 
with A: = 2, 3,..., 9, whereby the OSEE is monotone increasing with D. To emphasize the symmetry, the center spin is denoted 
as position zero. 


a) b) 



Figure 6: For the initial states a) |4'+) ( |41[ ) and b) |\['-|.) \A2\ and a system size L = 51, the operator space entanglement entropy 
(OSEE) of the TADM q is plotted over the position of the bipartition. The different curves belong to different bond dimensions 
D = 2^ , where for a) k = 2,3, ... ,9 and b) fc = 4, 5,..., 9. For |4'+), the OSEE is monotone increasing with D, while for 14^-j^), 
the OSEE is monotone decreasing with D. To have an unambiguously decreasing plot for the bond dimensions D — 4., 8 
were omitted, since for for them, the OSEE is smaller than for D = 16. To emphasize the symmetry, the center spin is denoted 
as position zero. 


We determined the time averaged expectation values 

and the associated variances for spin systems of 
lengths L = 13,25,51, see Fig. Here, we find virtu¬ 
ally identical results for the MPO and the double MPS 
ansatz. 


For L — 13, the time averaged expectation values 
and their variances can also be calculated exactly (D 81 . 
The results of our algorithm display a slight overestima¬ 
tion of the variances compared to the exact results. To 
put this finding into the right perspective, we also deter¬ 
mined the variances by sampling over 1.5 • 10® different 


l^'outer flip(^)) for times t = 0... 10® and 3 • 10® samples 
for f = 0... 10^°. Hereby, we observed a greater differ¬ 
ence between the variances belonging to the time interval 
t = 0 ... 10® and the exact result than for the result of the 
time interval t = 0 ... 10® and the outcome of our algo¬ 
rithm. For smaller time intervals, this effect is even more 
pronounced. But even for t = 0... 10^°, we still notice 
a difference between the sampled variance and the exact 
result. This indicates changes on timescales which are 
extremely long compared to the timespans which are usu¬ 
ally accessible for numerical simulations. Independent of 
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a) b) 



Figure 7: For the initial states a) I’Fouter flip) (44l and b) 14^central flip) (43l and a system size L = 51, the operator space 
entanglement entropy (OSEE) of the TADM g is plotted over the position of the bipartition. The different curves belong to 
different bond dimensions D = 2*, with fc = 2,3,..., 9. For liPouter flip), the OSEE is monotone decreasing with D, while for 
l^'central flip), the curves broadeu with increasing D. To emphasize the symmetry, the center spin is denoted as position zero. 



-- site 


Figure 8: For the initial state V ( |48[ ), the operator space entanglement entropy (OSEE) of the TADM g is plotted over the 
position of the bipartition. The different curves belong to different bond dimensions D = 2^, with A: = 4, 5,. .., 9. The OSEE 


is monotone decreasing with D. To have an unambiguously decreasing plot, the bond dimensions D = 4, 8 were omitted, since 
for for them, the OSEE is smaller than for D = 16. The state V is an artihcial double state, consisting of two system in a row, 
each of length L = 51 in the |4t outer flip) state. The zero position denotes the bipartion which separates these two systems. For 
the OSEE at this position and the consequences for the variances cr^, see also appendix D 1 


these differences, our findings clearly indicate that also 
for larger systems, the three outer spins do not equili¬ 
brate within a predictable timeframe. 


Finally, to underpin the reliability (and also the limita¬ 
tions) of our results, in Fig. M we show the q values ( [4^ 

for ^app rox — l^outer flip)(^outer flip | and. 7^ approx We 

emphasize that these are not the values obtained from 
the T+ method as in Fig. but the results of the T— 
method (461, which were also used for the computation 
of the variances. 


IV. DISCUSSION AND OUTLOOK 

We have presented a new method to compute time av¬ 
erages of density matrices and operators based on a con¬ 
straint overlap maximization. A big advantage of this 
method is that it can be easily combined with a tensor 
network ansatz, as we demonstrated for matrix product 
operators (MPO) and double MPS (appendix]^. As a 
new method, it should be compared with already exist¬ 
ing ones. Of all possible alternative methods, here, we 
consider exact diagonalization. 

Despite its name, the term exact diagonalization is 
commonly used for a numerical method. Often, the term 
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Figure 9: The approximation of the time averaged expectation value (Sfieid) 1401 is plotted over the logarithm of the bond 
dimension D. The initial states are a) 14'+) with a system length L = 13, b) |4'+) with L = 51, c) with L = 13 and d) 
with L — 51. To determine the time averaged expectation values, the time average 5fleid of the operator was calculated, 
as well as the time averaged density matrices g. For g, a MPO and a double MPS (appendix [ m| ansatz were used. For better 
visibility, the data points are slightly shifted - log 2 (Tl) is always an integer. 


exact diagonalization is also used for solutions obtained 
by, e.g., the Lanczos algorithm [25 or related iterative 
methods [221 • These methods allow to obtain faithful re¬ 
sults for some eigenvectors of the outer energy spectrum 
(especially the ground state) but the results for other 
eigenvectors in the middle of the spectrum are usually 
poor for systems of none-trivial sizes. 

Here, we only consider system sizes which do not al¬ 
low a complete diagonalization into all eigenvectors. If 
such a complete diagonalization is possible, this should 
be the method of choice. Also the iterative diagonaliza¬ 
tion algorithms can only handle systems up to a certain 
size. Beyond this limit, one might still combine these al¬ 
gorithms with a tensor network ansatz. This entails new 
complication, which will not be listed here, but see e.g. 
Ref. [30] for a detailed treatment. 

In case of an exact diagonalization of the outer spec¬ 
trum only, the decisive question is whether these outer 


eigenvectors suffice, e.g., to reconstruct the initial state 
'Fq. If this is possible, the time averaged density matrix g 
can be immediately constructed from these eigenvectors 
(although for many tasks, the explicit construction is not 
necessary). Therefore, various initial states 4'o might be 
categorized by terms like “easy”, ’’difficult” or “impossi¬ 
ble”, depending on their overlap with the eigenvectors 
of the outer energy spectrum. For the Hamiltonian de¬ 
scribed by Eq. (381, initial states like |4'+) should qualify 
for “difficult” up to “impossible” (we did not check this 
explicitly). Further, determining the time average of an 


operator 0{t) with exact diagonalization should be im¬ 
possible for nearly all commonly used operators. 


These tasks, which are difficult up to impossible for ex¬ 
act diagonalization, are also difficult for our algorithm, 
in the sense that an exact solutions requires huge bond 
dimensions which exceed our resources. Still, since hav¬ 
ing a weak approximation is still better than having no 
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Figure 10: For the initial state I'l'outer flip) (441, the time averaged expectation values and their variances are plotted over 

the site index j, for system lengths a) L = 13, c) L = 25 and d) L = 51. Figure b) shows a comparison of the variances of the 
three outer sites for L = 13, calculated with five different methods (the data points are slightly shifted for better visibility). 
The methods used for the calculation of theses three variances are (from left to right): 1/ exact result according to Eq. (D8l; 
2/ sampling of |Touter flip(i)) with t — Q... 10^'^; 3/ sampling of l^louter flip(i)) with t = 0... 10®; 4/ algorithm with double MPS 
ansatz (appendix [m|) and 5/ algorithm with MPO ansatz. For other sites beyond the outer three ones, the variances become 
to small to separate them reliably from numerical imprecision. The bond dimension is always D = 512. 


solution at all, this is probably the realm where our al¬ 
gorithm has its strongest superiority compared to exact 
diagonalization. 

For the “easy” states, future investigations have to 
show which approach is the most promising. Here, ex¬ 
act diagonalization algorithms have a certain advantage, 
since they only deal with states and not with density 
matrices, as our algorithm does. But one also has to 
consider the task at hand. For expectation values, work¬ 
ing with a collection of eigenstates is numerically favor¬ 
able, while for the operator space entanglement entropy 
(OSEE) calculated in Sec. IIIC one needs the explicit 
form of Q. Here, even for many easy initial states 
algorithm should be favorable. 

We also determined the variances (Sec. |HID1 


our 


as¬ 


sociated with the initial state I'Fouterflip) (441 and un¬ 
veiled that the outer spins do not equilibrate. Although 


|4'outerflip) should qualify as “easy” state, we are not 
aware of any previous tensor network based approaches 
which did a similar calculation. 

For future applications, it might be interesting to com¬ 
bine the TADM algorithm with other types of tensor 
networks, which allow the handling of greater amounts 
of entanglement and/or higher dimensions than the one¬ 
dimensional case treated here. Several alternative tensor 
network structures are known with different ad¬ 

vantages and drawbacks. It remains to be seen which of 
them can be integrated well in the algorithm presented 
here. 

At the end, we like to add the speculation that there 
is a certain chance that even flawed Papprox of difficult 
states might give rise to suitable results for expectation 
values, if the erroneous contributions average out. For 
a better understanding, we start with the widely ac- 
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Figure 11: Comparison of the q value (451 for different system lengths L — 13, 51 combined with either a MPO or a double MPS 
ansatz (append ix [M] !. The logarithm of the q value is plotted over the logarithm of the bond dimension D, for a) |Touter flip) 
( |44| | and b) V ( |48[ ). In both cases, the T— method (461 was used. For better visibility, the data points are slightly shifted - 
log 2 (Il) is always an integer. 


cepted assumption that for most closed quantum states 
g{t), local expectation values reach an equilibrium value, 
which they adopt most of the time. If such equilibrium 
value exists, it has to be the same for g{t) as for the 
TADM g. Looking at the off-diagonal elements of such 
g{t) = Pjk{t)\Ej){Ei;\, we find that \pjk{t)\ = const. That 
is, contrary to the TADM g, the off-diagonal elements do 
not vanish. Still, both density matrices g{t) and g have 
the same expectation values. Here, the general assump¬ 
tion is that the initially aligned pjk{t = 0) dephase and 
as a consequence, average out. 

Now, this dephasing is also an interesting aspect in 
the context of flawed gapprox- For difficult states, the al¬ 
gorithm might fail to remove all off-diagonal elements in 
Papprox) but if the residual off-diagonal elements are suffi¬ 
ciently randomized, their influence on expectation values 
might simply average out, as well. At this point, fur¬ 
ther investigations are needed to decide, whether or not 
the algorithm really randomizes the residual off-diagonal 
elements. In any case, we remind the reader that our 


algorithm does not introduce errors by altering the diag¬ 
onal elements of the TADM g, which is e.g. not true for 
a flawed diagonalization algorithm. 
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Appendix A: Inverse problem 


Here, we study the possibility to solve the time aver¬ 
aged density matrix (TADM) g as inverse problem. We 
start with Eq. (101, which expresses the off-diagonal ele¬ 
ments £»off-diag of go as Commutator 


Hoff-diag — [.H, Af] 



(Al) 


look at the formal structure of the inverse problem, since 
with Eq. (A2|, we have establish a link to a well known 
class of problems. As a bonus, the algorithms derived for 
the calculation of the matrix M might also be used for 
other inverse problems, arising from completely different 
tasks. 

In our particular problem, we have to keep in mind that 
€ has a non-vanishing kernel CMdiag = 0 (14l. Hence, 
should be a well defined Pseudo-Inverse. That is, 
while £-£“^eoff-diag = £*ofi-diag) we also have to demand 
that 


e:-e:~^^?diag = o. (A3) 

As a consequence, we find 

€■ €~^go = €■ €~ ^ goiag + €-€~^ fioff-diag 

0 ^off-diag 

— f?off-diag- (-^4) 

With that, the ansatz described by Eq. ([^ reads 

g = go - €-€~^go, (A5) 

which evidently would not work if were a regular 
inverse. 


1. Quasi-degenerate eigenstates 

A key aspect of our algorithm is the distinction be¬ 
tween diagonal and off-diagonal elements, respectively 
the difference between the two cases Ej = Ej, and 
Numerically, this distinction becomes blurred 
for quasi-degenerate energy eigenvalues Bg, with 
Eg — Er = £, where 0 < £ <C As a conse¬ 

quence, if the energy difference e becomes to small, the 
calculated time averaged density matrix g might contain 
non-zero matrix elements pgr ^ 0, which should actually 
be zero. 

To be fair, one has to mention that quasi-degenerate 
energy eigenvalues pose a general problem, which is not 
restricted to the method presented here. To illustrate 
this, let us replace the idealized limit T —> oo in the 
definition (|^ of the time averaged density matrix g = 

lim 7 ’_>oo g{t) ■ dt^ by a more realistic finite value 

of T. As long as for this T the condition 

(Eg - Er)T > h (A6) 

holds, Eq. (§ is still a good approximation 


Our task is to find a suitable matrix M. Formally, this 
is solved by 

M = €r^ gos-diag- (A2) 

We do not know the inverse operator and generally, 
it is much more demanding to construct than to find 
a suitable M. Still, it seems beneficial to have a short 


J exp ^—^{Eg — Er)t^ dt Ki 0. (A7) 

But for quasi-degenerate energy eigenvalues, the condi¬ 
tion {Eg — Er)T h might no longer hold and the den¬ 
sity matrix pT averaged over the finite timespan T might 
contain off-diagonal matrix elements pgr which deviate 
substantially from zero. 
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Appendix B: Alternative optimization ansatz 

Throughout this paper, we follow the strategy to ex¬ 
press the TADM as g = go — €M ( |27| . In a realistic set¬ 
ting, the optimal M will be a highly structured object, 
which is often far to complex to be represented faithfully 
on a classical computer. Therefore, we have to settle for 
good approximations of M. This raises the question, how 
the optimal approximation of M respectively g should 
look like. The answer to this question is not unique and 
closely related to the question how we actually measure 
the quality of a given Papprox- 

In general, we need a quality measure which is numeri¬ 
cally calculable with the limited computational resources 
at hand and which does not require knowledge of the ex¬ 
act TADM g. Under these conditions, the arguably best 
choice is the residual time dependence of gapprox, respec¬ 
tively 

II £*approx II = llQapproxll ^ 0, (Bl) 

which vanishes for a perfect approximation. Normally, 
in case of ||Clgapprox|| = 0, the diagonal elements of 
Papprox (expressed in energy eigenstates) could still be 
erroneous. But for our method, which is based on the 
ansatz Papprox = 0o — CIM ( p^ , the diagonal elements 
are guaranteed to be flawless, since €M is a purely off- 
diagonal matrix. Hence, ||£papprox|| = 0 if and only if the 
approximation is perfect. 

Now, let us try to And the matrices M which mini¬ 
mize IlCpapproxll = || 21 (e 0 - ^M)\\ 

M = arg min^llC (^>0 — C1M')||^^ 

= arg min(]2et€^iSaI-2Re((£<o|£^l^')) 

=const. 

+ {M'\€'^\M')^. (B2) 


The error in any expectation value made by using 
Papprox instead of the exact g stems entirely from residual 
off-diagonal elements Pjfe II in gapprox withp^fe ^ 0. 
Both methods try to remove these off-diagonal elements, 
but for the minimization of |l£papprox|| i elements with 
small energy differences \Ej — Ek\ have lower priority 
than elements with high energy differences. Generally, 
there is no reason why elements with high energy differ¬ 
ences should have a stronger impact on the error than 
elements with low energy differences. It is even more 
likely to assume that for a local operator Oiocah the un¬ 
wanted contribution {Ek\0\ocai\Ej) is close to zero if the 
energy difference \Ej — Ek\ is high. 

For the remainder of the paper, only the minimization 
of ll^approxll respectively Eq. (261 are explained How¬ 
ever, the adaptations to be made to handle Eq. (B3l and 
with that the minimization of ||£papprox|| should be quite 
obvious, once the principles of the algorithm are under¬ 
stood. 


Appendix C: Time averaged operator and error re- 
dnction 


The TADM g is an object which is associated with 
the Schrodinger picture, where states are time dependent 
and operators are time independent. In the Heisenberg 
picture, where states are time independent and the oper¬ 
ators are time dependent, the analog to the TADM is a 
time averaged operator O 


As done for the initial operator Oq at time t = 0 can 
be written as 


This minimization can be done in two steps. First, we 
determine the optimal Mnormed under the condition that 
{M\€‘^\M) = 1, i.e. 

Mnormed = arg max ({go\€^\M')Y (B3) 


Once this Mnormed Is found, we can rescale it similar to 
Eq. (27). Hence, Eq. (B2| becomes minimal for 


M = Mnormed] go) ' Mnormed- (B4) 


1. Which method is better? 


(C2) 

3,k 


Analog to Eq. ([^ , the time averaged operator O consists 
of (block-)diagonal elements only and can be expressed 
as 


O — Odiag — Oo Ooff-i 


diag- 


(C3) 


Follo win g the same line of argumentation that led to 
Eq. (271 for the TADM g, we And that O can be ob¬ 
tained as 


The minimization of Eq. (B21 results in a gapprox with 
the smallest possible time dependence. But how about 
other physical properties as expectation values? Here, 
the standard method which minimizes UgapproxH (28) 
should still be the better choice, as we discuss in me 
following. 


O = 0o- (CMlOo) ■ CM, (C4) 

where M has to be chosen such that the inner product 
{Oo\€M) is maximized and ||£M|p = 1. 

Using the decomposition into diagonal and off-diagonal 
elements g = £)diag and Oq = Odiag + Oog-diag, we And for 
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time averaged expectations values (O) 


of reasoning are 


(o) 


(elOo) 


{l?diag|^diag “f ^off-diag) 


{l?diag I ^diag } 





(C5) 


where we used that (Adiagl^off-diag) = 0 for any matri¬ 
ces A, B due to structure of the inner product {A\B) = 
Similarly, we find 


(e|0o) = (e|0) = (^>o|0). (C6) 


The possibility to use both time averaged matrices g and 
O to calculate the time averaged expectation value (O) = 
offers a potential way to reduce numerical errors, 
as we show next. 


1. Error reduction 


— (£)approx|Oo) (O) — (^off_diag I *^off-diag) 
iff ^ = (l?o|Oapprox) — (o') = (iPoff-diagl^off^diag)' 

To compare the errors iff^l and E'^^^ , we start with 

a simple error model, which assumes that all off-diagonal 
elements are roughly damped by the same factor e < 1, 
giving us 


c[q] 

^off-diag 

c[0\ 

^off-diag 


^£^off-diag 

^^off-diag- 


(Cll) 


In this case, (poff-diaglOofi-diag) scales with the 

square of e, while = E'^^^ = e (poff-diaglOofi-diag) are 
only linear in e. 

Unfortunately, for our MPO based calculations, this 
error model proved to be insufficient. Instead of using 
one damping factor e for all matrix elements, it seems 
more adequate to use individual damping factors Sjk for 
each matrix element 


For most non-trivial cases, any numerical optimization 
rout ine will only be a ble to find approximated matrices 
M (271 and M (C4|, which deviate from the optimal 
ones. In this case, the off-diagonal elements of the ob¬ 
tained gapprox and Oapprox do not vanish completely, as 
they should. That is 


^^approx 

^approx 


^diag 


c[^] 

''ofF-diag 


D -L 

Wiag -r <^off-diag’ 


(C7) 


where, ^og.diag and fog^^iag represent the erroneous off- 
diagonal elements. In any case, the diagonal elements are 
always correct for the approach presented here. With 
this, we find for the time averaged expectation value 

(o) 

\ /approx 


^Idiag = E ^tpMEj){Ek\ 

Ej^Ek 

^Iff’diag = E 4^o,k\E,){Ekl (C12) 

where pjk and Ojk are the matrix elements of qq respec¬ 
tively Oq. With that, we obtain 

El,A s ^ 

Ej ¥^Ek 

eW Iffil y; 

Ej jiEk 

Eioi y- ,C 13 ) 

Ej ¥=Ek 


/n \ /n 

\ ^approx/ — ^t'app: 


^approx) 

\Oki- \ 

— \ fe'diag -I- <^off-diag I r>'diag i- <^off-diag/ 


— ( gdiag I Odiag ) + ('^off-diag | '^off-diag ) 


(C8) 


where we used again that (Adiag|Soff-diag) = 0. Evidently, 
in case we use both approximated time averages gapprox 
and Oapprox) the error time averaged expec¬ 

tation value (Oapprox) is given as 


^[e,0] _ (o^ _ (fog_jiag|'^foff-diag)- 

(C9) 


If one resorts to (O) = (^lOo) or (O) = (^olO) instead, 
the expressions for the errors is obtained by the same line 


For statistically distributed |ef^| ^ 1 ^ error 

ifle.O] generally be smaller than the errors E^e] and 
But we can also find an error model, where this is 
not the case. Let us suppose that most matrix elements 
belonging to the bases \Ej){Ek\ are either extremely easy 
to approximate or extremely difficult. In the first case, 
we expect £^1 « « 0, while in the second case, £^| « 

« 1 seems an adequate assumption. Evidently, for 
this error model eff *-£f^f « and with that, all 

three errors and E^^^ are roughly the same. 

Even for the model, where £^f and only adopt the 
two values zero and one, using both time averages gapprox 
and Oapprox to calculate (0) offers still an advantage, if 

the values of and are not correlated, i.e., if the 
combinations £^^ = 1, = 0 and = 0, = 1 are 
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realistic. The likelihood of having uncorrelated and 

e^^f} might strongly increase, if ^approx and Oapprox are 
calculated by two different methods. Unfortunately, the 
error reduction relies on the property of the commutator 
based method (27) that the error is entirely restricted to 
the off-diagonal matrix elements of Papprox and Oapproxj 
while the diagonal elements are 100% accurate. For other 
methods, this is not necessarily the case. 

Further, we have to ponder the computational effort to 
contract (gapproxlOapprox)- While for two MPO, this ef¬ 
fort is acceptable, this is e.g. no longer necessarily true if 
Papprox is given as double MPS (appendix|^ and Oapprox 
as MPO. For two double MPS, the effort would be ac¬ 
ceptable again, but it seems unlikely that a good operator 
approximation 0, 


approx 


can be obtained with an double 
MPS ansatz. In our MPO based applications, the error 
correction only provided a slight improvement. 


Appendix D: Variance of expectation values 


Here, we study the possibility to compute the variance 


= Var 


of the expectation value (p(t)|0) 


with respect to time for an arbitrary operator O 


cr^ = (p(t)|0) - (£-(t)|0) , 
where the overbar 7TT indica tes t ime average. 


(Dl) 


The second term in Eq. (Dl) is simply (^g{t)\0') = 


(p|0) (C6l, while the first term (p(t)|0) needs a more 
thorough treatment. In order to find the time average 
with the means presented in this paper, we write 

{g{t)\Of = {g{t)\0) {g{t)\0) 

= {Qi.t) O 0{t) I O 

'' P(t) ^ O 

= imio). (D2) 

That is, by squaring the Hilbert space respectively dou¬ 
bling the quantum system, we formally transformed the 
quadratic expression {g{t)\0)'^ into the linear expression 
{'P{t)\0). This allows us to proceed as follows 

{g{t)\0)^ = {e{t)\Ofdt 

= nt)dt)\o) 

= {v\0). (D3) 

Alternatively, we could also compute the time averaged 
(5 instead of P, since {V\0) = (Vo\0) (C6). To calculate 


'P = Udiag, we can use Eq. (27\ V = where 

A1 is a suitable matrix we nave to find and T-L = H ® 


1 -I- 1 0 iJ is the Hamiltonian of the doubled quantum 
system. The chosen structure of the Hamiltonian T-L is 
easily understood when we look at the time evolution of 
V{t) = g{t) ® g(t) 


g{t) ® g{t) 


(D4) 


with h '■= 1. 


1. Fully equilibrated systems 


Analog to Eq. (D2|, we can write (p(t)|0) = {g\Oy 
as {g\Oy = ® g\0 0 O) and with that 


cr^ (P|0 0 O) _ (gi 0 p|0 0 O). (D5) 

If a quantum system can fully equilibrate in the sense 
that the variances cr^ vanish for all operators O, we must 
have 


VO : 0-2 = 0 © V = g®g. (D6) 

That is, "P is a product state consisting of two g. Con¬ 
versely, the amount of entanglement between the two 
subsystems in V can be regarded as indirect measure for 
incomplete equilibration. For most systems, demanding 
that the variances cr^ vanish for all operators (not just the 
local ones) should be too strong. On the other hand, we 
can always restrict the operators to a certain subsystem 
A by tracing out all parts of V which do not belong to A 
and look for the entanglement in the remaining system. 


2. Energy eigenstates 

The eigenstates \8) of the Hamiltonian T-L = H ® 1 + 
1 ® H are simply given by \Ej) ® \E]^), where \Ej) are 
the eigenstates of ET. Both eigenstates \Ej) ® \Ek) and 
\Ek) O \Ej) have the same eigenvalue Ej -f Ek- Hence, 
the spectrum of Ti is always degenerate and "P is a block- 
diagonal matrix. 

On page we introduced the convention to refer to 
the block-diagonal elements as “diagonal”, as well (see 
Eq. ( 0 ). Here, for once, we have to discriminate between 
diagonal and block-diagonal. Actually, g®g and V have 
the same diagonal elements, but in addition, V has some 
extra block-diagonal elements due to the afore-mentioned 
degeneration. Assuming that the spectrum of El itself is 
non-degenerate and that the spectrum of EL exhibits no 
further degeneration beyond the one identified above, we 
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find that 


against its precursors Aij^n by the following steps 


Q® Q = '^PjjPkk\Ej){Ej\® \Ek){Eu\ 

i,k 

V = Q ® Q E^PjkPkj\Ej){Ek\ ® \Ek){Ej\, (D7) 

with Pi k = {Ej\Qo\Ek)■ Inserting this result into 
Eq. (D51, we find for the variance of the time averaged 
expectation value (O) 


~ 'y / WPjkOkj II J 


(D8) 


with Ojk = {Ej\0\Ek). 


Appendix E: Solving the optimization problem 
general approach 


n—1 


M'n = 

3 = 1 


M„, = Um'W ^M' 


(E4) 


The correctness of this procedure is easily seen when we 
multiply the two equations from the left with £, which 
turns the procedure into the standard Gram-Schmidt or¬ 
thonormalization for matrices = it Ain- 


Using Eq. (E21 for Poff-diag and the orthonormalization 
Eq. (E31, we mid 


(CAIfelpoff-diag) - 


P3 

- Oik- 


(E5) 


According to Eq. (221, the overlap (ClAIfelpoff-diag) with 
the unknown Poff-diag is the same as the overlap {tAik\Qo) 
with the known qq, such that 


In Sec.|n] we have seen how the calculation of the time 
averaged density matrix (TADM) q can be formulated as 
a common linear optimization problem with quadratic 
normalization condition. Here, we study a general ap¬ 
proach to solve this problem, whereby, we assume that 
the needed matrix operations can all be executed. Due 
to the exponential scaling of the Hilbert space, this as¬ 
sumption is usually only justified for quantum systems 
consisting of very few particles. Still, other more pow¬ 
erful methods might adopt the ideas of this section, as 
we will demonstrate for the MPO based approach in ap¬ 
pendix ^ 


We like to construct a matrix M which fulfills Eq. (19) 


£*off-diag 


® CM ® [H, M] 


and hence allows to calculate g = go — Poff-diag (|^- The 
strategy we are going to pursue is to approximate M as 
a sum of iteratively generated matrices Aij 


M = Y, <^3-^3 

3 

£'off-diag ® (E2) 

3 


Uk = {€Aik\go) (E6) 


can be calculated for any given Aik- 

In other words, we are able to project the unknown 
matrix Poff-diag onto a set of orthonormalized commuta¬ 
tors €Aij - After n iteration steps, the absolute difference 
between Poff-diag and its approximation i® 


n 

£*off-diag gjClAIj I — ^11 £*off-diag 




3 = 1 


(E7) 

Since we do not know the exact value of Ugoff-diagll ^ 
IIPoll) we have no good estimator for the quality of the 
approximation. If we had used the numerically more de¬ 
manding Eq. (B3l instead of Eq. ( [2^ as optimization 
objective, we would have obtained ^ 1- 

But here, we can only state the obvious that the approx¬ 
imation gets better with each new matrix Ain+i with 
a„+i 7 ^ 0. For fast convergence, we like to find new ma¬ 
trices Ain+i with |an+i| which are preferably as big as 
possible. This is what we are going to study next. 


1. Generating the matrices Mj 


with ttj G C. The matrices Mj are modified in such a 
way that the commutators tAij build an orthonormal 
system 


(€Ai,ltAik} = Sjk- (E3) 

To do so, we can use an iterative method similar to the 
Gram-Schmidt orthonormalization. Any time we gener¬ 
ate a new matrix Ain, it is orthonormalized A4„ —>■ A4„ 


In this subsection, we present a method to generate 
suitable matrices Aij for Eq. (E21. To start with the 
first matrix AI i, we like to find a A4 1 with a big absolute 
value |ai| (E2), which is according to Eq. (E6l 


ai ^ (CAIiIpo) ® {Aii\tgo) - 


(E8) 


Therefore, as educated guess, we choose AAi = £po- 
Here, a tilde is used to distinguish the matrices }Aj from 
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the matrices Aij which are already correctly orthonor- 
malized according to Eq. (E3). For A4i, this simply 


1-1 


means Mi = ||GlA^i|| Mi. 

Now, what do we choose as second matrix Ad 2 ? If we 
use the same ansatz as for Adi, we have 02 = (Ad 2 |Cl£>o) 
and hence M 2 = ^(>0 ~ but that is the result we already 
had for Adi, so we cannot use it again. This problem 
actually occurs for all Adj>i. To solve it, we have to go 
back to the first line of Eq. (E5l 
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(CMnlPoff-diag) ^ ^a,(e:Ad„|e:Ad,) 

3 

^ {€Mn\Qii) = ^a,(e:Ad„|G:Adj) 

3 

{€Mn\go-'^aj€Mj) = an{€MuKMn) 

(M„\Cgo-’^ajC^Mj) = a„(Q:Ad„|G:Ad„). 
j¥=n 

(E9) 


Now, we introduce two approximations: First, we ignore 
the quadratic term (ClAd„|£Ad„) (i.e., treat it as = 1). 
Second, since we do not know Adj>„, we simply ignore 
them (which is the same as demanding that Ad„ will be 
perfect and no more Mjyn are needed). With that, our 
educated guess becomes 


n—1 

Ad„ = Cpo - ^ aj€^Mj. 

j=i 


(ElO) 


This can also be written as Mn = M n-i — ctn-i^'^Mn-i. 
We still have to orthonormalize Ad„ (E4|. Therefore, we 


can drop the term Mn-i in M„, since it can be expressed 
as linear combination of the previous matrices Adj<„ 


Mn —>■ Mn — d Mn-l- 


(Ell) 


Now, M„ is obtained orthonormalizing Ad„. From 
Eq. (EllI, we find by induction that the matrices Ad„ 


can be expressed as 


Ad n — ^ ^ ‘^nj ^ ^ go^ 
i=i 


(E12) 


with some appropriate coefficients 7 „j. Hence, the sub¬ 
space spanned by the matrices Adj is 

span {Adi, Ad 2 , ■ • ■, Ad„} = span { Cpo, £^ 00 , • ■ • 




which has the structure of a Krylov subspace. 


^00 } ) 


(E13) 


a. Orthogonality 

At the end , the matrix M„ c an be o btaine d orthonor¬ 
malizing ( |E4| either Mn (ElO I or Ad„ (Ell I against the 
previous matrices Mj^n- Actually, for Ad„ it already 
suffices to orthonormalize it against the last two matri¬ 
ces Mn -2 and Mn-i to ensure that it is orthonormal to 
all matrices Mj<cn. For Ad„, it is even sufficient to or¬ 
thonormalize it against the last matrix Mn-i only. The 
proofs are given in appendix]^ Still, in practical calcula¬ 
tions, such orthogonality results for iteratively generated 
Krylov subspace basis are often undermined by numerical 
imprecision and should be handled with care. Therefore, 
one might still consider to orthonormalize each new ma¬ 
trix against all previous Mj. 


2. Discussion of the general method 


To generate suitable Adj (|E2|, we have studied a 


method building up a Krylov subspace, which is a com¬ 
mon approach, as, e.g., used by the Lanczos algorithm 
pS] to find eigenvectors. Therefore, we have a closer look 
at the inner structure of the solution, which might help 
to put the method into the right perspective compared to 
alternative methods. This is a special topic, which might 
be skipped. 

Combing Eq. (E2| and Eq. ( E12| , we find that after n 
iterations, Poff-diag is approximated as 


^off-diag 


1 


i-1 

n 3 

j=l k=l 

n 71 

= X! with /3j = ^ ap-fpj. 

i=i p=3 

(Eld) 

Evidently, this ansatz uses only even powers of £. This 
corre spo nds with the general eigenvector problem in 
Eq. (31l. Here, we find the two operators and 
Cl|£*o)(^|Ci, where the latter can be absorbed into the nor¬ 
malization (E3). Solving the general eigenvector prob¬ 


lem by a Lanczos algorithm using the same initial vector 
would result into the same Krylov subspace. Still, one 
might wonder whether odd powers of £ could possibly be 
useful, as well. For the commutator operator ( [l7| , the 
answer is definitely no. For odd powers of the commu¬ 
tator operator, the matrix has zero overlap with 


the matrices go and £<off-diag, as we prove below (E19l. 


(i>off-diag|£^^+Vo) ^ (goie^^+^go) ^ o. (eis) 

Even with this vanishing overlap, the odd power terms 
go could still be useful in an indirect way, if they 
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had an overlap with any of the even power terms qq 
which have non-vanishing overlaps with qq. But this is 
never the case 


^ ^ o. (ei 6 ) 


Hence, the terms with odd powers of £ are of no help to 
approximate £ioff-diag- 

We still have to prove Eq. (E15l = 0. To 

see this, it is useful to expand the commutators £” 


(0o|e:>o) ^ M {0o\H--^goHP) , (E17) 

p=0 

with the binomial coefficients as in (a — 6)” = 
J2p=o (~1)^ Since Qq and H are a Hermitian 

matrices, we can use the cyclic property of the trace to 
derive 


{go\H--^goHP) = (golHPgoH--^) . 


(E18) 


Using this equation together with the identity (”) = 


(n-p)^ we can add the terms for p and n — p in Eq. (E17) 
and use their average to obtain 


(6'oK"'eo) = 


(-If+ (-1)"-^ 


P—0 


(go\H--’^goHP} . 

(E19) 

Since (—1)^ + (—1)"”^ vanishes for all odd numbers n 
independent of p S N, the inner product (£>o|21"po) is zero 
for odd powers n = 2j + 1 of £, as claimed in Eq. (E151. 


The ansatz described by Eq. (E14) does not make use 


of any terms H^~p goH^ with odd n. It is not that these 
terms are not useful per se. They are just not generated 
in a useful combination applying powers of the commu¬ 
tator £ on pq. If we produce these terms by other means, 
they can be part of the approximation (E2| of Poff-diag) 
as well. 

The reason for using the commutator £ was that it 
generates matrices which have no overlap with the time 
averaged density matrix g. In other words, the compo¬ 
nents \Ej) {Ek\ vanish in the matrices generated by the 
commutator £ for identical energy eigenvalues Ej = 
Another way to guaranty the vanishing of these compo¬ 
nents is resorting to matrices 


^ / apH^^ PgoH^, with 


P—0 


y Up = 0 , Up G c. 

P—0 

(E20) 

This can be seen inserting go = J2Pjk\Ej) {Ek\ in 
Eq. (E20l. Actually, the matrices £"po build a subset 


of the matrices ^ (E20). Another example for a subset 


of the matrices ^ are the matrices 9Jt = € {EL^g^El'^), 
with arbitrary p, g € N. 


A special situation arises, when the initial state is pure 
go = I'I'o) ('I'ol- Then, inner products as 

{go\HPgom) = (vI/o|i?P|'I^o) (^o|i?«|^o) (E21) 

are easily calculated if we know all i7’'|4'o). In this case, 
we could also use a diagonalization method based on the 
Krylov subspace 

= span {I vf/p), 77 |vI-o) ,..., 77”-^ |vI-o)} , (E22) 

as the Lanczos or Arnold! algorithm |28l 155] to obtain 
approximated energy eigenstates \Ej). With these, the 
TADM g can be approximated as 

Papprox = |Ej)(7fjj. (E23) 

3 


We emphasize that the Papprox obtained by this Eq. (E231 


is not suitable for the error reduction method introduced 
in appendix |C 1| For this method to work, it is essential 
that the diagonal elements of the approximated TADM 
Papprox are error-free, i.e. 


(7f,|papprox|7ffc)'"''=^ {E,\g\Ek). 


(E24) 


This is generally not true for Eq. ( |E^ , while it is guar¬ 
anteed for the method introduced here (271. 


Appendix F: Time averaged density matrix as ma¬ 
trix product operator 

In appendix we explained an algorithm for finding 
the time averaged density matrix (TADM) p. But so far, 
this algorithm does not solve the main numerical problem 
which usually hinders us to calculate p: The exponential 
scaling of the Hilbert space with the number of the con¬ 
stituents and the associated demand for computational 
resources. In this section, we address this problem and 
present a monotone converging optimization algorithm 
based on a matrix product operator (MPO) approxima¬ 
tion, which allows the handling of the exponential scaling 
(see overview articles p51 - E5] b 


1. Short introduction to MPO 

Any matrix operator M acting on a system consisting 
of n sites Sj can be written as 

M= Y. (fi) 

with a high-dimensional tensor The idea of a 

MPO is to express this high-dimensional tensor ®!s(...s^ 
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Figure 12: Graphical representation of the MPO for the com¬ 
mutator operator £ (171 and the matrix M. Each MPO tensor 
is symbolized by a shape. Auxiliary indices are depicted by 
horizontal lines, while open vertical indices represent physical 
indices. 


O = 0[j] might demand tensors 0[j] whose size scales 

exponentially with the number of constituents. In this 
case, one can still use tensors 0[j] of limited size to ob¬ 
tain an approximation O « ®b]- 

Once we put restrictions to the size of the MPO ten¬ 
sors, MPO no longer build a vector space. That is, adding 
two MPO might create a sum MPO whose tensors exceed 
the preset size limit. The same is even more likely for the 
product of two MPO. Therefore, we cannot use a simple 
one-to-one mapping to cast the algorithm presented in 
appendix]^ or any other algorithm into MPO form. 


as a product of n low-dimensional tensors M [jj. 


2. MPO tensor optimization 




oti •••a. 


[l]siSi 


l\OilOt2 

[2]s2s'^ 




• M, 


(F2) 


where Sj and s' represent the physical indices, while the 
Uj are auxiliary indices which are summed over. In case 
of closed boundary conditions (which we use here), the 
dimension of the outer index is set to one (i.e., we can 
ignore this index). 

Since Eq. ( |F2[ ) is not really a well-readable expression 
(and it becomes even worse if we consider e.g. products 
of operators), one often resorts to graphical representa¬ 
tions, as in Fig. 12 Here, we also use the (non-standard) 


symbolic shorthand notation 


M = Y[M 




(F3) 


which represents the joint information of Eq. ( |F1[ ) and 
Eq. (F2|. In the same way, we write Cyj for 


the commutator operator, where each tensor 


'[j] 


S S - S -j S ■ 


carries the four physical indices Sj , s', sj , s'. For the con¬ 
struction of the MPO tensors of £, see also appendix [L| 
To calculate an expression like €M = C[j] • M[j], we 
need to sum over the common physical indices 


b]sjS,SjS b]sjs 


(F4) 


and the auxiliary indices. To shorten the notation, in the 
following we will often use the multi-index Uj = 
for the two physical indices Sj, s' of a MPO tensor. 

Many operators which are relevant for practical cal¬ 
culations allow an exact MPO representation based on 
tensors M[j] (F3l which are of relatively small and con¬ 
stant size, independent of the total number of the con¬ 
stituents. Here, we assume that this is also true for the 
initial density matrix go = rij Pbi and the Hamiltonian 
H = J([^- H[j]. If it is true for the Hamiltonian, it is also 
true for the commutator operator e: = n, C[,i 

Unfortunately, this is not true for all operators. For 
a general operator O, an exact MPO representation 


Assume we have two MPO M' = and M" = 

^b] differing only by one tensor Mj^j ^ (which 
still have the same dimensions), while all other tensors 
are identical Symbolically, we 

write these MPO as 

M' = n Mb] 

3¥^k 

M" = M",nMb]. (F5) 

3^k 

For a,(3 G C, we find 

aM' + PM" = (aM{,] + PM'(,^) [] M^,. (F6) 


The important observation is that the tensor + 

has the same dimensions as Mjj,j and That 

is, adding the MPO M' and M" does not conflict with 
any preset limits for the size of the MPO tensors. Hence, 
MPO of the type M' and M" which differ only by a single 
tensor still build a vector space. For this reason, theses 
MPO are much more suitable to realize an adaptation of 
the algorithm explained in appendix [E| 

We express the TADM as g ~ c£M (271, where 
c G C and 


M = nMb] (F7) 

3 

is the MPO we have to find. Our task is to design an 
algorithm which optimizes a single tensor while all 
other tensors are kept constant. Once we have 

this algorithm, we can apply it in repeated sweeps of the 
index k to all tensors in the MPO. 

In appendix we introduced a general algorithm, 
which expresses M as a sum M — aiMi ( |E2[ ). Here, 
we adopt this idea but extend it by the demand that the 
Mil are all MPO with the same tensors 

M = M«nMb]- (F8) 
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Figure 13: Graphical representation of the MPO overlap 
{go\i\M). The area Flp] = defines the ten¬ 

sor environment of the tensor Mpj. This corresponds to the 


situation in Eq. (|F11|), with = ^[ 5 ]' 


The MPO Ml differ only by the tensor Due to the 
linearity described by Eq. (F 61 , the summation of the 


MPO M = OLiMi (E2) is equivalent to the summation 
of the tensors Mj^j 


is basically the same procedure. In the orthonormaliza¬ 
tion, as well as in Eq. (F12l, we encounter the term 


(n, 


Lk M[i]) 




Fig. (appendix Q 


Lk M[j], 


which is also depictured in 


3. Advisable modification of the algorithm 

So far, we have presented a direct adaptation of the 
ideas presented in appendix |E l|to obtain the locally op¬ 


timized MPO tensors (F9|. But this adapta¬ 

tion is far from being optimal and does not result in a 
monotone converging algorithm. This will be mended in 
this subsection. 

To see the problem, imagine that for some reason, we 
have already found a perfect MPO M, where all MPO 
tensors are optimal. Now, let us denote 


= M[,] 


(F13) 


optimized _ V ^ 

'[fc] - 


ai\ 


.(0 


(F9) 


resulting in the locally optimized tensor 

If we follow the ideas outlined in appendix |E 1| to 
obtain suitable MPO Mi, we need to generate linear 


combinations of €Pgo (E13l, where p denotes an expo¬ 
nent. How can we calculate while we are at the 

same time forced to keep all MPO tensors un¬ 

changed? Here, we have to remind ourselves that the 
original intention in appendi x |E 1| was to maximize the 
overlap (polCM) = (po|C|M) ( |E8| ). Inserting Eq. ( |F8| ) in 
(eo|£|AIi)) we obtain 


(eolclA^i) ^ 




= tr((^,oe:nMbi)Mg). (Fio) 




This is maximized for 
i(i) 


M 


[,]=MnMw)^=(nMbi)'£eo, 

j^k j^k 


(Ell) 


see also Fig. In the same fashio n, we can generate 


the tensors analog to Eq. (|E10l 


.(0 

'[fc] 


i-i 


“ (IT ^bi) (*^^o ttpC^Mp), 


(F12) 


with Mp = 


nip) ■ 
'[fc] 


used a tilde to mark that M has still to be orthonor- 


nj 5 ^fcM[j] (F8l. As in Eq. (ElOl, we 


[fc-] 


malized (E3i against the other tensors. Due to the lin- 


sor Mjpj or the associated MPO Mi = Mj(,j 


earity expressed by Eq. (F6), orthonormalizing the ten- 


(0 ■ 


and apply the 


opt imization procedure presented in the 


F2 


to find a new tensor 


>|Optimized 


'[fc] 


. Since 


last subsection 

is already optimal according to our assumption, 

/loptimized 


the optimization ivij^j 
produce the tensor 


= (|F9|) should re- 


This is for sure true, if we 


sum up sufficiently many tensors But in practical 

applications, one would usually just calculate a handful 
of these tensors 




1,* • ^/lOptimized i i 

resulting in an which is 

supposedly worse than the tensor we started with. 

For readers who are familiar with the Krylov subspace 
based MPO optimization for ground states |ifo) 


|i?o) = arg min (T|7J|T') , 


(F14) 


we mention that here, the main difference is that the 
Krylov subspace for the ground state search is based on 

m 


/C„ = span (T-, i74', ..., , (F15) 

while the algorithm for the TADM is based on 

/Cn = span {Geo, , £'"-^ 00 } , (F16) 

according to Eq. ( |E13[ ). If we start the ground state opti¬ 
mization with an optimal MPO IT*) = |i?o), this optimal 
is part of the Krylov subspace used for the optimiza¬ 
tion. Therefore, the optimal solution is already obtained 
in the first step. Contrary to the ground state optimiza¬ 
tion, the algorithm for the TADM starts with €go, which 
does not convey any previously gained information about 
the optimal solution. 

Of course, starting the optimization with an already 
perfect solution is just an extreme example to illustrate 
the problem: The algorithm, as it is so far, does not learn 
from previous optimization steps. This is actually easy 
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to fix. We just include the old MPO tensor M°^j^ into 
the subspace basis we use for finding the optimal tensor 
The subspace we obtain in this fashion is no 
longer a pure Krylov subspace, but this is actually of no 
real importance. 


So, we still use Eq. (F9l 


fl optimized _ \ '' 

'[fc] - 




(F17) 


but for the first basis tensor we now take the nor- 

[fc] ’ 


malized (E31 old tensor 


(|(1) _ , 

[fc] \\<IM\\ 


(F18) 


Appendix G: Orthogonality proofs 

Here, we provide some or thogo nality proofs. Fir st, we 
start with the matrices Ain (ElOI respectively (Ell I 
introduced in appendix [Eland after that, we look at their 


-^0 


tensor network version Mj^j (F18l and (F19l. That is, we 
first show for the general method of appendix (1 


{(!:Mj<n-2\<^Mn) =0 
{(LMj<n-l\^Mn) =0, 


(Gl) 

(G2) 


which expresses the demanded orthogonality according 
to Eq. (E3). 


We start with the proof of Eq. (Gl I, which is a stan¬ 
dard proof for Krylov subspaces 


(2) 

For the second basis tensor , we could use the for¬ 


merly first tensor = (Ilj/fe (Fill. But 

following the same line of argumentation which led to 
Eq. (ElO), we find it actually more advisable to use di¬ 
rectly Eq. (F121 (the analog tensor equation of Eq. (ElO l) 


i-i 

Mw= (H (F19) 

j/fc P=1 


a. Further modifications 


The algorithm has still room for further improvements, 
which are explained in detail in the following appendices. 
Here, we just outline what can still be done: 

• Tensor networks exhibit a versatile gauge freedom. 
Ghoosing an optimal gauge is an essential ingredi¬ 
ent for a successful tensor optimization. It is advis¬ 
able to use a non-standard gauge, which is tailored 
for the weighted norm ( |E3| ) used in the algorithm. 
This is explained in appendix |H| 


{€Mj\€Mn) ^ {(LMj\(!:{€^Mn-i)) 
{€M,+,\€Mn-i) 


(G3) 


The matrix -Mj+i lies in the subspace 
spanjAli, AI 2 , ■ ■ • which is orthogonal to 

Ain-i (according to the definition (E3)) for j-l-1 < n—1. 
Hence, for j < n — 2, the overlap (^€Aij\£Ain) is zero 
such that Ain needs only to be orthonormalized against 
Ain-2 and Ain-i, as claimed before. 

While the poof of Eq. (Gl) only uses typical features of 
Krylov subspaces, the proof of Eq. (G21 takes in addition 
advantage of Eq. (ElO I, which is specific to the problem 
at hand. We start by rewriting Ain as 

_ __ n— 1 

Ain €go - ^ aj€^Aij 

i=i 

n—1 




1=1 


= CM 


(G4) 


with 


• The tensor optimization is done in many successive 
sweeps. With a small alteration, we can take ad¬ 
vantage of previous optimization sweeps to speed 
up the convergence. This is a special feature of the 
TADM algorithm, which we dubbed overarching or¬ 
thonormalization. For more details, see appendix |l] 

• Density matrices are always Hermitian matrices. 
This implies a symmetry which can be exploited 
and allows to map complex valued MPO onto real 
valued MPO with the same bond dimension. While 
most symmetries are connected to some special 
properties of the physical system in question, the 
Hermitian symmetry is common to all physical sys¬ 
tems. For more, see appendix 


Aij 


n — 1 

:= go -^aj€Aij. 


(G5) 


In this equation, the = {€Ai j\go) (E6) are chosen such 
that the ajCAij annihilate the CMj components in go. 
Hence, we find 


{€Aij<n\Mj_)=0. 


(G6) 


With this result, we can prove Eq. (G2| 




^ {€Mj+i\Ai±). 


(G7) 


24 





























Figure 14: The MPO norm (Ilj ^U]III; ^[i]) 
expressed as weighted tensor norm 


(H4| can be 

M[fc]). 


The 


MPO tensors can be gauged such that L[i...fe_i] = 1 and 
R[fe+i...„] = 1. This situation should be compared with the 
situation in Fig. |15[ where such a gauge is generally not pos¬ 
sible. 





Il'^W 


= £2 


ly^5 


1^5 


Figure 15: With the help of the effective tensor operator C^yk], 
the weighted MPO norm (]~[j |H4l can be 

expressed as weighted tensor norm \C^\k] | M[fc]). 


The matrix -Mj+i lies in the subspace 
span jAd i, A^ 2 , ■ ■ ■, Af,-+i}. Hence, according to 
Eq. (G6l, we find = 0 for j -|- 1 < n and 

with that (£Af,<„_i|CiA4„) = 0 (G7|, as claimed in 

Eq. del. 


1. Tensor network method 


Finally, we have a look at the tensor network based 
method, as it was explained in appendix |F 3| for MPO. 
Here, we face two main differences compared to the gen¬ 
eral case: As a first difference, we only alter one MPO 
tensor at a time, which implies that we keep all 
other MPO tensors constant. This is e.g. the 


reason for the appearance of the term (nj 5 ,ifc ^[j])' 

Eq. (F19). But with due diligence, one finds that this 
does not alter the line of argumentation used above. 
As a second difference, we no longer deal with a pure 
Krylov subspace, beca use o f the extra role of the MPO 
tensor (F18l. That is, the MPO Afi = 


l(i) 
'[fc] 


M[j] (F 81 is generally not orthogonal to = 

n 


Appendix H: Gauging the MPO 


MPS and MPO contain a gauge freedom, which can be 
exploited to improve the performance of the algorithm. 
In our case, the optimal gauge is not given by the stan¬ 
dard canonical forms, which are used in most other algo¬ 
rithms |23| . 

The gauge freedom of the MPS/MPO stems from the 
simple fact that one can always insert a matrix 0 [j] and 
its inverse ajjy between two MPS tensors M[j] and 


M“f Mf.7 

bl' 


b+iic 


r.f 

b]o-j 


/3m -1 A* 
‘bi “bi ' 




(HI) 


''[ 3 ] 


''[3+1] 


and replace them by —>■ M[j]a[j] and —>■ 

The aim of this section is to describe a 
method for finding beneficial matrices ayy. 

For many application, it is advisable gauging a MPS M 

(H2) 

in such a fashion that the norm of the entire MPS M 
reduces to the norm of the single MPS tensor M[/;] 


((>:Mi|g:a4„) ^0, (G8) 


while for all other MPO Mi = T\.j^k ^ 
line of argumentation used above still holds. 


[j] (F8l, the 


i.e. 


(eiM l<^<n—1 \€Mn) = Q. (G9) 


In other words, each new tensor needs to be or- 

thonormalized against and This result will 

be of great importance for the speed up explained in ap¬ 
pendix [Tj 


VM 


[fc] 


\m\ ^ 


\k] n 


special gauge 


(H3) 

This can be achieved by the canonical form described e.g. 
in Sec. 4.4 of Ref. |2S|- Since the norm of the single tensor 
M [fc] is easily controlled, this canonical form is extremely 
helpful for MPS based algorithm which have to fulfill the 
common side condition ||Af|| = 1. 

In our case, we h ave to deal with the weighted norm 
= 1 (241 as side condition, which favors a dif¬ 
ferent kind of gauge. If this side con dition could be sim¬ 
plified in the same way as Eq. (H3), i.e., = 
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11M 11 for arbitrary M , the remaining overlap optimiza¬ 
tion (FIO) would be trivially solved by Eq. (Fill without 


the necessity to approximate the optimal MPO t enso r 
as a sum of many tensors as in Eq. (F9l. 

Unfortunately, a general = ||M[j,]|| cannot be 

achieved by simply gauging the MPO. Still, we should 
attempt to get as close as possible to this relation to im¬ 
prove the performance of the entire algorithm. So, let us 
have a closer look at the weighted norm 




= 


= (f 


j^k l^k 

j^k l^k 


^[k] 


(H4) 


where we ha ve i ntroduced the effective tensor operator 
see Fig To have « ||M[i,]|| for all 

M[fc], we need « 1. 

We remark that we actually never calculate explic¬ 
itly, since the effort to do so scales with the fourth power 
of the bond dimensions of the MPO M, while the scal¬ 
ing of all operations presented so far does not exceed the 
third power. Therefore, we only calculate with the com¬ 
ponents of Cp.j, which are and 


(H7l is absent and the remaining and 

can be treated as matrices. Using the trivial fact that 

L = with ^/ii = \f\j and similar for R, 

choose the gauge matrices to be 


we 


a[fe_i] = and = \/R 


^[fc] 




-1 

[fe + 1- 


(H8) 


and the gauge transformation (H71 result in 


0 

([HT} 

O 





— 


|H7) 

®[fc] 

pl^ 


(H9) 


With these identities, the MPO norm (M|M) reduces to 
the tensor norm (M [fc] | M [j,]), as for the canonical form. 
Now, we come back to the weighted MPO norm 
where and R|^T|!(*i...„] carry ay-index 

and are given by Eq. (H6). In this case, the gauge matri¬ 


ces and do no longer have the same dimensions 

as and R|]^Z|^i...„] and hence, Eq. (H 81 can not be 

used to obtain the gauge matrices. With 
multi-index ^ = (^1,7)) L^^..fc_i] 


the 


help of the 
can still 

be written as matrices and be decomposed by a singular 
value decomposition 


and 


T ifk 

^[1-k-l] 

■rCm 


= 

= u; 




[L,k-l]^[L,k-l] ''lL,k-l] 






[R,k+l\^[R,k+l] ''[R,k+iy 


VA, 


(HIO) 


p2 fj.k-ltkk-lk-kfkk _ T Atfc_l7fc_l/ifc_l j-2 Ik-llkTik-klkk-k 

'~'[k\SkrTk ~ ’ 

(H5) 

2 


as shown in Fig. 15 Hereby, are the tensors of the 
MP O representing the squared commutator operator 
(jlTj) (where the square in C^^-j is just symbolical, as in 
C^). The left L[i...fc_i] and right R[fc+i...„] can be cal¬ 
culated iteratively as 


T Mi Ij Mi 

A-j] 


■p Mi — 17i — 1 Mi — 1 

Aj-n] 


T Mi — 17i — 1 Mi — 1 
Mi — 1 Mi 


( a ::: 


Mi—iMi 

bUj 


— iMi 


pMi7iMi 

■■^b+i-" 


(H6) 


If we perform a gauge transformation as in Eq. (HI I, we 
obtain 


T k-lk 

fiP'rfk 


■RM7M 


a 

p,i/ 




r (^* V 

-.(“S (H^) 


To mimic the effect of the inverse square root in Eq. ( |H8[ ), 
we define the gauge matrices as 


*[fc] 


n[/c-l] — ^[L,k-l]^lL%-l]^lL,k-l] 


~ ^lR,k+l]^lR,k+l]^[R,k+l]- (Hll) 


Inserting these gauge matrices into Eq. (H7l, does not 


turn and Rfj;,., 

’ MM 


into identities as it was 


possible for and (H9l, but at least 

and R|^T|^*i...„] get closer to the identity. 

We remark that this procedure is not optimal and 
could still be improved by more complicated methods. 
For example, if we use the gauge matrices Ml to 
transform and according to Eq. (H7l, 


we could iterate the procedure and use the transformed 
and Rn,_|_]^...„] to obtain further gauge matri¬ 
ces. However, in our applications, a single gauge trans¬ 
formation turned out to be quite profitable, while further 
iterations had no great impact. 


1. Decomposing C 


[fc] 


Now, we need to find beneficial gauge matrices To 
understand the procedure, it is helpful to look at the 
standard norm {M\M) without the commutator operator 
(Fig. 141. Here, the y-index in and R|^T|^\...„] 


So far, we just considered the left and right 
and R{^T|^\...„] separately to obtain the gauge matrices. In 
case of the standard MPO norm (M|M), this separated 
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treatment is perfectly justified (see Fig. 141, while for 
the weighted norm and 

are connected by the squared commutator operator 
(Fig. 151. Therefore, the natural object to consider is the 
effective tensor operator . As already mentioned, cal¬ 
culating the effective tensor operator is is numerically 
expensive. Fortunately, it will turn out that we do not 
require calculating explicitly, but for the moment, let 
us pretend we have done so. We start by replacing the 
singular value decompositions in Eq. (HIO) by the more 
suited decomposition 




iLfJ.k-1 


= u: 


ihi' 










with the multi-indices = (/j-k-i, Mk, Mk, o'k, crk) and 
= {p-k-i, fJ-k-i, fik,^k , gfc) - These new matrices 
can now be used in Eq. (Hll) to obtain better gauge 

-1 


matrices a^k-i] = and 


^[B. k+i]^[R /c+i]^[^>fc+i]> take the full operator 

into account. 


a. Successive decomposition of C 


[fc] 


Now, we show that we do not require calculatin g 
explicitly. The needed matrices D and V of Eq. (H12) 


can be obtained relatively cheap by successively decom¬ 
posing the components and along 

the indices which connect the components. To demon¬ 
strate this procedure, we assume that we like to cal¬ 
culate the gauge matrix 

We start by rewriting as matrix 

with the multi-index = {flk-iT fJ-k-i) and apply a QR- 

decomposition 


■C7fc-l 


^[L,k-l]' [L,k-lY 


(H13) 


Next, the freshly obtained matrix multiplied 

kik-iik 


into the MPO tensor ( C?, 




'fc-i / r 
'[L,k-1] 


2 y 

m). 


\ V akO-k 


(H14) 


Now, we repeat this procedure and rewrite the modified 

~ / - \ ^7fc 

tensor C^j,j as matrix with k = {ak,o'k,v) and 

perform another QR-decomposition 


(Sf.) 


KV „nik 
%C,k] [C,kY 


Finally, we multiply this into R^y^.^ 




_ rr^'nik 

^[/c+l---n] ~ ' [C,fc]^[fc+l---n]* 


(H15) 


(H16) 


This new tensor R[fc_|_i...„] should now be used in 


Eq. (HIO) instead of R[fc_|_i...„], i.e. 


R 




= u, 




_ T~\V 




(H17) 


with ^ = {fj,k,r]). Putting these three decompositions 
(H13l, ( H15[ ), and (H17l together, we obtain 


l|2£} T j~2 -p 


— Q[L,k-l\q[C,k]U[R^k+i\D[R^k+l\y[R,k+l\- 


(H18) 


Since [q[L,k-i]q[c,k]U[R^k+i]) q[L,k-i]q[c,k]U[R^k+i] = 1 


has the same isometric property as UyR k] in Eq. (H12l, 
we find that Eq. (H18l is a correct singular value de- 


compositions of Cy. H ence, we can use D]^R k+i] and 

«[fe] = 


^[i?,,fc+i] of Eq. (H17l for the gauge matrix 


^[R,k+i\^[R^k+vy[R,k+i] In order to calculate the 

gauge matrix we proceed in the same spirit, only 

in the opposite direction (first decomposing R[j,_|_]^...„] 
then the modified and finally the modified L[i...fe_i]). 

We like to remark that in our algorithm, we optimize 
the MPO tensors M[j,] in ascending, as well as in de¬ 
scending sweeps of the index k. In each of these alternat¬ 
ing sweep directions, only one of the two gauge matrices 
a[fe_i], is calculated while for the other, we assume 
that the result obtained in the last sweep in opposite 
direction is a sufficiently good approximation (although 
one could also consider calculating both gauge matrices 
anew each sweep). That is, if the next MPO tensor we are 
going to optimize is Mj^.], the last tensor that has been op¬ 
timized should either be or In case 

has been optimized last, we use the gauge matrix a[fc_i] 
to obtain M[k-i] and Mj^], 

while after the optimization of we need the gauge 


matrix ar,( for the transformation Mr 


''[/c] kiaiiokkiiikiakkk,!!. —?• 

M[fc+i] ay M[^.+i]. 

For the gauging of the MPO tensors, we need to in¬ 
vert matrices respectively their singular values. Numer¬ 
ically, this procedure might be troublesome. Therefore, 
one should regularize the gauge matrices. Further, it 
might be helpful to bring the MPO in their canonical 
form before re-gauging them, since the canonical 

form is already a good approximation, which can be ob¬ 
tained without the need of inverting matrices 


and 


2. Physical gauge 


The effective tensor operator 


nk-lUk-lUkUk 


(H5I 


->2 A 

carries four tensor indices iik-i, k-k-i, k-k, k-k correspond^ 
ing to the auxiliary bonds and two (multi-)indices d'k,(Jk 
corresponding to the physical dimensions. Of these six 
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indices, only the four auxiliary indices are effected by the 
MPO tensor gauge. To obtain an effective as close to 
the identity as possible, we can also introduce a “gauge” 
for the physical indices, although this resembles more a 
transformation than a gauge. We remark that in our 
applications, the benefits of this transformation were far 
less pronounced than the benefits of the gauging applied 
to the auxiliary indices. 

The idea is to place a matrix and its inverse 
between the physical bonds of the MPO tensor and 
the tensor respectively C^[k] (H5l 


' [k]ak<Tk 


[k\(T'f^ [k\(T'^(Tk 

q 2 p.k-ifJ^k-iP'kfJ'k 




[fc], 


Mk]cr'k'^k- 


(H19) 

Obviously, this transformations keeps the weighted norm 
unchanged. But we also have to consider 
that the overlap {go\€\M) (Fig. is effected by this 
transformation of the physical indices with b^k] and has 
to be transformed accordingly. 

The procedure to obtain the optimal gauge for 
the physical indices is similar to the procedure used for 
finding the optimal gauge a^k] for the auxiliary indices. 

We start by writing as matrix {pfk]^ with the 

multi-index ^ = {ftk-i, fJ-k-i, fik, fJ'k, ^k) and the remain¬ 
ing physical index Ok and perform a singular value de¬ 
composition 






= U]^k]D[k]V]^k]- 


(H20) 


The transformation matrix is now obtained as 


^[fc] — ^\k\^\k]'^[k]- 


(H21) 


As before, the singular value decomposition of can be 
obtained by decompositions of its components 
C^j,j and (H5l along their connecting indices. 

That is, we write and as matri¬ 

ces with the multi-index n = (/i, /i) and perform QR- 
decompositions 


T K7 

■rK7 


(l[L,k-i\'>’[L,k-i\ 

<l[R,k+i\f[R,k+i\- 


Then, the two r matrices are multiplied into 


[fc] 




q:/3 


:= r. 


Q;7fc- 

[L,k- 


‘.1 (ci.i) 


ik-iik 


[fl.fc-l-1] 


(H22) 


(H23) 


/- \r]crk 

(Cffcjj With r, = 
{a, P, ak) and perform a singular value decomposition 


Finally, we write as matrix 


Cffc] = U[k]D[k]V[k], 


(H24) 


which delivers the matrices and needed for the 


i _A . . 

transformation matrix b\k] = ^[fc]^[fcf%] P21 . 




Appendix I: Speeding up convergence by overarch¬ 
ing orthonormalization 


In this section, we introduce a method which allows for 
speeding up the convergence of the algorithm presented 
in appendix |F 3[ The key to this method is the insight 
that essential information gained in previous optimiza¬ 
tion sweeps can be passed on to later optimizations to 
obtain a faster convergence. 

To understand this new method, we start by review¬ 
ing the general MPO optimization procedure. So far, we 
mainly discussed how a single MPO tensor M[/j] (F17i is 
optimized. During the optimization of the tensor M[/j], all 
other tensors are kept constant. Since these ten¬ 
sors are most likely suboptimal, the optimization 

of a single tensor will generally not allow us to ob¬ 
tain the globally optimal MPO. According to Eq. (F17l, 
M[fc] = ck; • Mj^j is optimized by summing up several 


weighted basis tensors m[ 7 On the one hand, the more 


dO 


'[fc]- 


we sum up, the better the result for the tensor 
On the other hand, an exhaustive optimization of a single 
tensor M is a waste of computation time if the remain¬ 
ing tensors are still far from optimal. Therefore, 

in practical applications, one will usually just sum up a 
few Mj^j and instead perform many repeated sweeps of 
the index k (which means we optimize the first to the 
last tensor and then start over again and again). During 
these sweeping cycles, the same tensor Mj^,] is optimized 
many times, each time with different environmental ten¬ 
sors which also have been optimized in between. 

As a result of the first optimization sweeps, the ten¬ 
sors M [j] are likely to change substantially. But with each 
completed sweeping cycle, the modifications of the ten- 


1[j] are converging 


sors M[^] should subside, since the 
towards their optimal value. When the changes have be¬ 
come sufficiently small, the environmental tensors 
can be considered to be approximately constant between 
two optimization cycles of M^,]. This puts us into the 
position to reuse information won in previous optimiza¬ 
tions of To see this, we assume for a moment that 
the environmental tensors do not change at all, 

and compare the situation where we perform many short 
optimizations of the MPO tensor M with the situation 
where we do one long optimization. Here, by short op¬ 
timization, we mean that we sum up just a few basis 
tensors to obtain the new MPO tensor which 
corresponds to the situation in the algorithm. 

In order to have a fair comparison, we assume that in 
total, the many short optimizations genera te th e same 


(F17)as the 


amount of linearly independent tensors M 
one long optimization. The crucial difference is that for 
the one long optimization, the tensors are not only 


linearly independent, but also orthonormal (|E3|). Solely 
for orthonormal 


/i(0 


Tfc]’ 


, Eq. (E6 ] provides the optimal over¬ 


lap ai for Eq. (F17]. In other words: one long optimiza- 
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tion is superior to many short optimization. 

What we intend to achieve is that the many short op¬ 
timizations we use in the algorithm act the same way as 
one long optimization. That is, we have to find a way 
to maintain the orthonormalization of the basis tensors 
Mjpj over many optimization cycles. In this context, the 
observation made in appendix |G l|is of speci al rele vance 


that any newly generated basis tensor (F19l is al¬ 
ready orthonormal to all basis tensors after it has 

been orthonormalized against the two basis tensors 

and Mj^j This limits the amount of information we 
have to transmit from one optimization cycle to the next 
to ensure that all basis tensors generated in consec¬ 
utive optimizations cycles are orthonormal. 

The basis tensor is always given by the result of 


the optimization round before (F18l. Since we assume 


that all tensors M[j] change only slightly from one short 
optimization to the next, the first basis tensor is 
roughly the same for the different optimization rounds. 
Hence, the orthonormalization against the slightly dif¬ 
ferent in the many short optimizations should ap¬ 
proximately have the same effect as the corresponding 
orthonormalization against in the one long opti¬ 

mization, which we like to mimic. Therefore, the only 
extra piece of information we have to transmit from one 
short optimization to the next is the lastly generated ba¬ 


sis tensor 


'[fc] 


which we denote as 


With that, we suggest the following improved rules to 
generate the tensors 


M 


M 


(1) (fiD 


[fe] 


[fe] 


Mold 


[fc] 


= M 


.(l>2) 

'[fc] 


1-1 

{l[My^y{€go-J2a,€^M,). (II) 

j^k p=l 


In comparison to the one long optimization, the concate¬ 
nated short optimizations need to do a few extra calcula¬ 
tions to patch the different optimizations together. But 
the more important comparison is not the short optimiza¬ 
tion versus the long one, but the new method presented 
in this section versus the old method presented in ap¬ 
pendix |F 3| 

( 2 ) 


In the worse case scenario, the basis tensor ivij^j 

.(last) 


M ,7 = 


has an overlap q ;2 = 0 (E6l, i.e., the basis ten- 
sor of the new method is completely useless. Under 

this condition, every further basis tensor produced 

by the new method will be identical to the basis tensor 
^[fcf^^^^ in the old method. That is, the maximal “dam¬ 
age” in the worse case scenario is that we have effectively 
one basis tensor less. 


We explained the advantages of the new method for the 
idealized scenario that the tensors M[j] do not change at 
all, but it should be clear that also for slightly varying 
M[j], a positive residual effect remains. The less the ten¬ 
sors M[j] change from one optimization cycle to the next, 
the better are the results we can expect. Therefore, we 
might use the old method described in appendix |F 3| as 
long as we detect strong changes in the M y]. When these 
changes drop below a preset threshold value, we might 
change to the new method presented here. 

So far, our main argument for the new iteration 
Eq. has been the overarching orthonormalization. A 
much more trivial point might also be that the old itera- 

(III runs 


h2) 

'[fc] 


tion schema without the new definition for 
a certain risk generating each optimization cycle some 
basis tensors which are very much alike the Mj^j of 
the round before. This is likely to happen if the changes 
in the tensor M|j.] per optimization cycle are only small 
compared to the changes which are necessary to reach 
the optimal tensor That is, especially when the 


basis tensors Mj^j prove to be badly chosen, the proba¬ 
bility is high that these bad basis tensors are reproduced 
to a great part in the next round. The new definition for 
- ^2) 

breaks this vicious cycle. 

We like to finish with a practical advice: The prereq¬ 
uisite for the overarching orthonormalization to work is 
that all tensors only change slightly from one optimiza¬ 
tion sweep to another. We also have to take care not to 
introduce any changes gauging the MPO, as described 
in appendix That is, we have to use the adequate 
gauge for as well. Now, we find that after several 

optimization rounds the gauge becomes approximately 
statical, as well and only changes slightly from sweep to 
sweep. Therefore, one might also keep the old gauge of 
- at least theoretically. Unfortunately, we learned 
that for the QR-decomposition, some software libraries 
take care that the diagonal elements of the upper trian¬ 
gle matrix R has only positive diagonal elements, while 
other libraries do not. In case of combined ascending 
and descending optimization sweeps, gauging with neg¬ 
ative diagonal elements can induce alternating signs of 
some tensor elements, wrecking the entire procedure, if 
the gauge for is not adapted. 


1. Comparison with other problems 


At the beginning of appendix |F 3| we shortly compared 
the approach for the time averaged density matrices with 
the Kr ylov s ubspace based MPO optimization for ground 
states (F14). Here, we refer again to the example of the 
ground state search to obtain a better understanding of 
the ingredients which are necessary for a successful ap¬ 
plication of the overarching orthonormalization method. 

Usually, the MPO ground state search consists of many 
optimization sweeps, where for each tensor optimization. 
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we build up a small Krylov subspace (F15), as well. 
Hence, we can also aim for an orthonormalization which 
overarches many optimization cycles. For the ground 
state search, this can be obtained with slight modifica¬ 
tions, i.e., we need to transmit the last two Krylov sub¬ 
space basis (see appendix Elal. But unfortunately, this 
will not help us to improve the algorithm. 

The important difference between the optimization of 
the ground state and the TADM is fo unde d in the way 


the equation (|F9| is executed. 


Except for the choice of the symbols and their interpre¬ 
tation, the ground state search uses the same type of 
equation to find the optimal solution. The crucial point 
is that for the ground state search, the calculation of the 
optimal coefficients ai (E6 \ and the execution of the sum¬ 
mation can only be done at the very end, when all 
are known. Another way to say this is to state that the 
optimal value of ai might depend on some which 
are calculated much later. For the TADM on the other 
hand, the optimal a; can be calculated directly after a 
new “Si has been generated. This allows us to sum up 
the components to a partial sum, immediately af¬ 


ter they have been computed. That is, soon after the 
have been generated, we can forget them completely. 

Principally, it is a solvable problem to memorize all the 
for the ground state search. But then we also need a 
strategy which takes into account that the environmental 
tensors are not really constant. As a consequence, 

calculations done with the become increasingly im¬ 
precise when they get older. Without going into further 
details, we state that for the ground state search, these 
problems seem to eat up most of the advantages one could 
hope to gain. 

In conclusion, we find that the overarching orthonor¬ 
malization method appears to be quite specific for the 
problem at hand, i.e., a linear problem with a bilinear 
side condition (weighted norm (E3)). 


Appendix J: Mapping Hermitian matrices onto real 
matrices in the MPO framework 

Any density matrix is Hermitian, which entails a re¬ 
dundant encoding. In this section, we show how this 
redundancy can be exploited by mapping the complex 
density matrix onto a real matrix, which contains the 
same amount of information but without the Hermitian 
redundancy. We remark that this mapping is not suit¬ 
able for the double MPS ansatz (appendix [M|, but in 
the MPO framework, this mapping can be performed ef¬ 
ficiently and allows us to obtain an algorithm which is 
entirely based on real numbers and hence, runs faster. 

Exploiting symmetries to obtain a faster algorithm is 
a quite common approach. Taking advantage of the Her¬ 
mitian symmetry is nonetheless unusual, since most sym¬ 


metries are based on special properties of the physical 
system, while the Hermitian symmetry is universal and 
based on the mathematical formalism of quantum me¬ 
chanics. We are not aware if a similar approach for MPO 
has ever been presented in the literature. Since the sym¬ 
metry is universal, the corresponding algorithm can be 
applied to all physical systems. 

Hermitian matrices are ubiquitous in quantum me¬ 
chanics and it might surprise that they are not exploited 
more often in numerical algorithms. One reason why it 
is difficult to take advantage of the Hermitian symme¬ 
try is that there are no universal matrices A, B which 
could turn each Hermitian matrix M with the correct 
dimensions reversible into a real matrix Mreai 

AMB = M,eal. (jl) 

For a matrix M = rrijk \j) {k\, hermiticity mjk = 
is a combined property of the bra and ket vector |j) and 
{k\, while the matrices A, B each only “know” either of 
them, i.e. bra or ket. To turn a Hermitian matrix into a 
real matrix, we need a linear map it which receives the 
combined information of \j){k\ as input. In this context, 
it is helpful to vectorize all matrices 

M = E iT^jk\j){k\ '^rnjk\j,k), (J2) 

jk j,k 

which in turn allows to write any linear map in form of 
a matrix J^jkim ^Uk)(im) I j, k){l,m\. Now, a suitable map 
il to turn a Hermitian matrix into a real matrix is given 

by 


3 

+ E[|-^'’ ^){^Ak\ + {k,j\) 

^ 3>k 

+i\k,j){{j,k\ - (fc,j|) 


(J3) 


with i = The factor was inserted to ensure 

v2 


fc)(j,fc|, (J4) 

j,k 


where W is given by {u(jk)(im)\j,k){l,m\y = 

|Z, m) (j, A:|, i.e., to obtain the Hermitian 

conjugate, il is treated as a matrix. 

Up to now, we just remarked that any density matrix 
is Hermitian. Since we search for a M with g = qq — c^LM 
(271, the term c€M has to be Hermitian, as well. Any 
phase factor in c = |c|e*'^ can be absorbed into M, 
which allows us to demand that c S M. With that, M 
has to be antihermitian M = — A/1 to have a Hermitian 
c£M = {c€M)\ Since we prefer M to be Hermitian, we 
include an extra factor i = 
now use the approach 


— 1 into Eq. (271, i.e., we 


g = go — ci<tM. (J5) 
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Multiplying this equation from the left with it (J3l and 
inserting the identity it^it = 1 (J4|, we obtain the real 
equation 


ilg = it£.o 

y Qo 

-real ^ 


(J6) 


with c = {i€M\go) = G K and the side 

condition 


^^real^^^real I ^real^^^real^ _ ^ 


(J7) 


Since (eeal^realjirrealjy^real^ ^ (G:M|g:M), this is ex¬ 
actly the same side condition we used all the time (241. 

So far, we just denoted as a real-valued map, but 
have not proved it. The map it (J3l was constructed 


such that it maps Hermitian matrices onto real matrices, 
but it is not evident that this entails to be 

real, as well. One could confirm this either via a detailed 
component by component check or simply by noticing 
that maps arbitrary real matrices onto real matrices 
and hence cannot contain any imaginary elements. Still, 
we have to come back to this point in appendix |J 1[ when 
we look at the MPO structure of £''®®*. 

Now, we like to have a look at what we have found. 
The main idea of the entire transformation was to have 
a faster algorithm. Any (linear) map acting on n x n- 
matrices corresponds toanxnxnx n-tensor. If we were 
relying on standard matrix and tensor multiplication, us¬ 
ing such a huge tensor would be highly questionable. For 
MPO calculations on the other hand, the physical dimen¬ 
sions are often of secondary importance. The decisive 
characteristic is the bond dimension. In this context, it 
is relevant to note that for maps which map Hermitian 
matrices onto real matrices, the outer product 


= (X) it, = ill O il2 O its o... O il„ 


(J8) 


i=l 


describes a mapping from Hermitia n m atrices to real 
matrices, as well, with it^it,^ = 1 (J4|. This is eas¬ 
ily checked applying it^ to a suitable base consisting of 
outer products of Hermitian matrices Hi. The 

structure of itg, corresponds to a trivial MPO with the 
bond dimension being one. 

Further, we need to provide a single MPO represent¬ 
ing the commutator operator £ to perform the mapping 
g-reai _ \ye cannot use the definition of the com¬ 

mutator operator £ = [H,...] fTTf, since the mapping 
cannot be decomposed accordingly. To see this, remem¬ 
ber that for a mapping like i7reai = iti7, the matrix H 
has to be vectorized, i.e., it acts on the bra and ket side. 
Vectorized matrices ilH and itAf do not allow a standard 
matrix multiplication ilHilM. If we rewrite iiH and ilM 
as matrices, the resulting matrix product is no longer the 
correct multiplication needed for il^il = 1 to hold. 

Many frequently used Hamiltonians H possess rela¬ 
tively simple MPO descriptions, with bond dimensions 


which are small compared to the bond dimensions needed 
to obtain suitable MPO descriptions for M (J5l. In 
this case, it is reasonable to write £ as a single MPO 
as described in appendix [L] instead of using the defini¬ 
tion £M — HM — MH (T^ . As a bonus, this enables 
us to use a MPO compression algorithm to p re-c ompute 
£^, which is needed to calculate(£Atj|£AIfe) (E3l. Com¬ 
pared to the explicit use of {[H, M.j\ \ \H, M.k\), this often 
entails a speed up. 

Since the Hermitian to real mapping MPO ilg, (J8l 


has the trivial bond dimension one, the real-valued MPO 
= it^eo, = it^tCit^ and M^®"i = ii^M 

(J6) have the same bond dimensions as their Hermitian 
counterparts. Further, we remark that the mappings 
Qo and £ —jj^ve to be applied once, at 

the beginning. Afterwards, we can compute with 

the same algorithm we would have used to obtain the 
Hermitian M. At the very end, when AF’'®*‘* is calculated, 
one final mapping gives us At = ilLAF''®*^*. 


1. Real MPO with complex MPO tensors 

When we stated that and £“'®‘^^ are real-valued ob¬ 
jects, we indirectly included the assumption that they are 
described by a single matrix or tensor. If we represent 
gig®*^* and £''®‘^* as MPO, they are decomposed into a prod¬ 
uct of tensors. These MPO tensors no longer have to be 
real. Usually, the simple transformation £''®‘^' = 


(J6) produces complex-valued MPO tensors. In this sub¬ 


section, we describe a procedure to turn the complex¬ 
valued MPO tensors of gQ®*^* and £''®*^' into real-valued 
tensors. 

In the following, we assume that we deal with open 
boundary conditions for the MPO. For most physical sys¬ 
tems of interest, it should be no problem to find MPO 
with open boundary conditions for and £''®*‘*. Un¬ 
der certain circumstances, periodic bo und ary conditions 
might be advisable for the MPO Af"'®'^^ (J6l. But AU®^' is 


generated by the algorithm and not the result of a trans¬ 
formation. Therefore, the MPO tensors of AF*'®**' are real 
by construction. 

Let us look at an arbitrary real operator O represented 
as MPO in its left-canonical form 


O = 




I lai I 10102 
^[1]oi'^[2]iT2 


I lOin-2<^n — l pOin-l 


(J9) 


Here, the aj are multi-indices comprising all physical in¬ 
dices of the MPO tensors (i.e., in case of the map £"'®®'', 
the MPO tensors have four physical indices). For an 
MPO in a left-canonical form, all MPO tensors U[j] are 
left-normalized except for the rightmost tensor R, i.e.. 


E 

— 1 


(U0i-I02l*y0l-10, ^ 

V .7 kl / \.l\cri 


(JIO) 


respectively Ein Any MPO can 

be brought into the left-canonical form via a repeated ap- 
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plication of a singular value or QR decomposition, start¬ 
ing with the leftmost tensor. For details, see e.g. Sec. 4.4 
of Ref. . 

Using unitary matrices V[j], we can construct new ten¬ 
sors 0[j] 


0 


OCj — iCXj 



(Jll) 


respectively = E/j Vjfp and P“; ' = 

E/3 (^[f-ifO new tensors, an alter¬ 


native MPO representation for the operator O is given 

by 


E 


/-.Qi (-iaiQ2 


n“"-2“"-l pOCn- 
^[n-l]cr„_Uo-„ 


F (J12) 


we can obtain 0[i] as the eigenvectors of W[i] (with all 
eigenvalues being one) or alternatively, via a singular 
value decomposition. The matrix 0[i] is not unique, but 
the important part is that it is always real-valued, in case 
is real, as it is the case for and 
Having 0[i], we can calculate the matrix V[i] via 




fm) 


E 


n* “O'*' 


v* P'1 


I/* Pa 
'^[1] 


(J16) 


With some slight adjustments, we can use the same tech¬ 
nique to calculate the matrix V[ 2 ] an d suc cessively all fol¬ 
lowing matrices Instead of Eq. ( |J13[ ), we now have 

E Tr*a/3||Q7 ]J11| \ " p./3i5 Tr* 7(5 T1 7i 

ol 6 


In appendix|K] we will prove the existence of unitary ma¬ 
trices V[j] such that all MPO tensors and P are real 
valued. Interestingly, we have to demand that the MPO 
representation (J9l is maximally compressed to ensure 
the existence of suitable Vyy That is, we do not allow 
MPO dimensions which belong to vanishing singular val¬ 
ues. 


a. Finding the gauge matrices Vy] 

Once the existence of the unitary matrices is guar¬ 
anteed, calculating them is relatively easy. We start with 
V[i] and note that due to the unitarity of the matrices Vy , 
we have 


where we assume that Vy-i] is already known. To facil¬ 
itate the notation, we introduce 


07 —07 _V^T/*a/3||a7 

^bje “ ^b]('^./3) ■“ 2 ^ b-1] 


^m{<r,P) 


b-1] bU’ 


(J18) 


with the multi-index © = {a,j3). Replacing . by 
we can repeat all the steps above to obtain Vy. In 
short, we calculate \Nyee, = Qy^Qy^, (J14) and 
via a singular value decomposition of VJyee't ob¬ 
tain Og]g (J15l, which leads to = Ee ^bje^b']® 


(J16i. 


Appendix K: Existence of the gauge matrices Vy] 


0 


ai ^ 


^Eu 


P V' 
^[1] 


/3ai 


^ Up]., 


= Eo 


T/* ait 

ill'll '^[1] 


With this, we find 


(J13) 


E ijQi 11* ai _ V - vy 


v: 


ai/ 3 -^ai 7 q* 7 


0 : 1 ,P,7 


-Vo^ 0 *^ 


=: W 


[IjCTidrp 


(J14) 


We remark that E. Ofq.Op]^ = while Wpj.., = 

Y./ 3 ^[i]cT^*[i]i' only equals 1../ iff dim (cr) = dim (,5). 
For real objects as and we know that a de¬ 

composition into real MPO tensors exists. In this case, 
has to be real, as well. 

Remember that the tensor Of], is still unknown, while 

[l]s > 


W 


[i]c 


= Ea bl[i]cr'-^fi]CT' oan be calculated. Since Wpj 


(J14l can be written as matrix equation 


W[i] = 0[i]10f^j, 


(J15) 


In this section, we prove the existent of the unitary 
matrices V[,], which we used in the last section (ap¬ 


pendix J 1 a) to transform the complex-valued tensors 
Up] into real-valued tensors Opj (J17). This proof is 


added for formal reasons only and is of no importance 
for the practical application of the algorithm. 

MPO tensors are not uniquely defined. We look at the 
case where we have two different MPO which represent 
the same object O. 


an-2ari-l pOCn-l 

l]cr„_7crr. 


Oil 

[1]^1 ■ ■ 

...u; 

.ai 

■ ■ 

..O' 


= U Ofi. ■- (Kl) 

oli . 1 

Both MPO are supposed to be maximally compressed 
and in the left-canonical form (JIO i. We like to show that 


for these two MPO, all tensors Up] and Opj (respectively 
R and P) can always be related by unitary matrices 
as in Eq. (Jill. 


For the upcoming proof, we need to shorten the nota¬ 
tion. To this end, we use the Einstein summation conven¬ 
tion, i.e., we imply summation over identical indices. Fur¬ 
ther, we introduce the two matrices and of 


bi 
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given as tensor products of the first j MPO tensors U[fe] 
respectively 0[fe] 


and 


Tr^3°‘3 _ rr®j-i“j-i I 

_ I pi I 


(K2) 


bl 


“ ^b-i] ^b>i 

= o“| 

[i]o-i bWj 


(K3) 


with the physical multi-index = (cti ... g,-). Since the 
two MPO (Kll are in left-canonical form (UTol, we find 


*^b]'^b] = l = OMOM, (K4) 

while generally ^ ^bl^b]- ^b]^[j] 

acts like an identity, when applied to Uy] 


(t^blt^w) t^b] = Uy^ («^b]t^bl) = Uyy (K5) 

1 


Multiplying this equation with Oj^.j we find 
^bl^bl = ^M^bl^b] 

1 ivtj ||^ 

1 = <]^b]- (KIO) 

Repeating the some line of argumentation for 0y]0^^^^0 
as we used for Uy]Uy-^0, we arrive at the conclusion that 

Um = 

%]<■] = 1- (Kll) 

Putting all together, we find 

off ® uffwff 

b] b] b] 

b-1] bJcTj- [j] 

of-fwf “r''^U“)-'“^lP“'’'^.(K12) 

b-i] b-i] bWj b] ^ ’ 


With that, Uyff-^ also acts like an identity when ap¬ 


plied to the MPO O (Kll 


Ub\U^yf = 0, 


(K6) 


which is easily seen when we use the MPO representation 
of O based on the tensors U [jj. On the other hand, when 
we apply Uyff^^ to the MPO O represented as 

^ l|Kl[l^K3( Qa^aj + i ^ Qa„_2^QTi-l pa„_i (K7) 


b+i]'^j+i 


'[n — l](Tri — l ^ri 


we find 

6 = u^^u\^d 


b] b] 


bl 


0“j“3 + l PO 

'-’b + l]^, + l ■ 


=--Ki' 

7-®,- 7t J r7aj r\^3 ^J + 1 


_ r r^i ' ^J + 1 (^0in-20^n-l pa^-i 

~ ^ bl '^bl ^b+l]^i + i ■ ■ • ■ ■ '-’ln-l]cT^- 3 ^a„ 


(K8) 


We demanded that the MPO O is maximally compressed, 
i.e., it contains no vanishing singular values. Hence, the 
expression TZ = built 

from the right-hand tensors of the MPO is invertible. 
Applying this inver se TZ ~^ to the MPO O in the form of 
the last line of Eq. (K 81 as well as to the representation 


in Eq. (K7), we find 


l [K7l p<8l 

^b] = ^b]^b] 


(K9) 


^ * 1 /3 

Multiplying this equation with Oy_f and using the 
identity (K4|, we finally obtain 


Oj_ip 

^b-1] 

6j_i/3 

^b-1] ^b-i] 


^®j7 

^bl 

= 

TT/* TT/^jT 

ol 


w*. 

b-1] b]^j bl 

of 

[J^j 

||K4l 

b-1] bl^j bl 



(K13) 


In the same way, it is easily shown that 


Pi = 


(K14) 


Since we can be sure that for a real-valued operator O 
an MPO based on real-valued tensors 0[j] and P exist, 
we can also deduce the existence of some gauge matrices 
Vyj = Wy] with the help of Eq. (K13l and (K14). 


Appendix L: Constructing a MPO for the commu¬ 
tator operator C 

To construct a MPO representation for the commuta¬ 
tor operator £, first, we need to construct a MPO rep¬ 
resentation of the Hamilton operator H. This is e.g. de¬ 
scribed in Ref. [551157] . 

The commutator operator £ acts on the vector space 
of linear operators with CA = HA — AH. Evidently, this 
can also be written as 

€A = HAl - lAH 
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Now, let us rewrite the commutator operator symboli¬ 
cally as 


= (LI) 


which is to be understood as (iJ(g)l)A = HAl and 
(1 ® H) A = lAH. Knowing a MPO representation of 
the Hamiltonian H = n.H [j], we immediately obtain 


iJ (g) 1 
l(g)iL 


Hh 


bK 


'bls'«' 




(L2) 


where |sj) and (sfj are the ket and bra components of 
the operator A. 

To take care of the minus sign in the commutator, we 
multiply the MPO tensor l[i] g) H[i] with —1. Then, we 
simply have to add the two MPO H g) 1 and — 1 g) iL. 
Adding two MPO is a standard procedure, which is e.g. 
explained in Sec. 4.3 and 5.2 of Ref. |23| . 


Appendix M: Double MPS 


So far, we presented a general computation method 
for the time averaged density matrix (TADM) and ex¬ 
plained in detail, how this method can be adapted for ma¬ 
trix product operators (MPO). MPO are just one exam¬ 
ple for tensor networks. Here, we discuss another (non¬ 
standard) type of tensor networks, where the TADM is 
obtained as a double sized matrix product states (MPS), 
which we dubbed double MPS. 

Formally, a double MPS is a MPS with twice as many 
sites as the physical system has components. Hereby, the 
first part of the double MPS represents the ket-states \ui) 
of the TADM or any other matrix M = 'Yhij ^ij ' l^i)(oli 
while the second part of the double MPS represents the 
bra-states {vj \. The matrix is encoded into the MPS- 
bond which connects the two parts, see also Fig. If 
the double MPS is brought into a suitable canonical form 
m, the basis states |ui) and |ui) encoded in M are or¬ 
thogonal (i.e., {ui\uj^i) = 0 = {vi\vj^i)) and we can 
extract the matrix Xij from the double MPS. This al¬ 
lows e.g. to check whether or not M is a, positive matrix. 
Assuming that the double MPS represents a positive Her- 
mitian matrix, its entanglement entropy of bipartion for 
the half chain corresponds to the entropy of the entire 
matrix M. 

We emphasize that the need for doubling the num¬ 
ber of tensors to accommodate bra- and ket-vectors in 
a double MPS arises from our special ansatz taking ad¬ 
vantage of the commutator, which needs to operate on 
the bra- and ket-vectors at the same time. As a conse¬ 
quence of this doubling, the bra- and ket-part are treated 
independently in a numerical algorithm which optimizes 
tensor by tensor. Therefore, the resulting operator is not 



Figure 16: a) Graphical representation of a finite MPO with 
open boundary conditions, b) Any finite operator O = 
\ui){vj\ can be decomposed into two connected MPS, 
where the connecting bond between the MPS corresponds to 
Xij. Due to the connecting bond, the two MPS actually cor¬ 
respond to two collections of several MPS \ui} and (uj|. c) 
Formally, any finite MPO can be represented as MPS of twice 
the size. 


forcedly Hermitian by construction 

'Y^Xij\u,){vj\ = \vj){u,\. (Ml) 

ij ij 


Still, since we intend to express the Hermitian TADM 
as double MPS, the optimization objective forces the al¬ 
gorithm to come up with a solution which is very close 
to Hermitian. At the end, for most applications, a lack 
of Hermiticity should not be more severe than any other 
numerical imprecision. If Hermiticity is of importance, 
we can still resort to M' = }^{M + M^). 

Comparing Fig. 16 b) and c), we see that due to the 
unfolding process b)^ c), the order of sites in the second 
part of the double MPS is inverted. That is, in a double 
MPS, the tensors Mqj • • • M^n] correspond to the physi¬ 
cal sites 1, 2, • • • n — 1, n, n, n — 1, • • • ,2,1. This ordering 
should be kept, since it is very convenient if we like to 
calculate expectation values, where we need to fold the 
double MPS as in Fig. [T^b). 


1. Implementation 

One of the great advantages of the double MPS is that 
with marginal adaptations, all algorithms we have devel¬ 
oped so far for the MPO TADM can be reused, except 
the Hermite to real mapping explained in appendix]^ 

The MPO based algorithm is operating with three dif¬ 
ferent MPO, representing the original density matrix g^, 
the commutator operator £ and the matrix M, which we 
optimize. All three have to be replaced by double MPS 
(where £ actually corresponds to a double MPO). First, 
we observe that throughout the algorithm, a single multi¬ 
index ffj = (sj, Sj) is used for the two physical indices sj 
and s' corresponding to the bra- and ket-index of the 
MPO tensors. Therefore, it is straight forward to replace 
the MPO structure in the algorithm by a (double) MPS 
structure. Of course, this is just a formal argument and 
we have to ensure the correct correspondence between 
MPO and double MPS. 
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The tensors M[j] of the double MPS which represents 
M are determined by the algorithm. For us, it remains to 
find the correct double MPS representation for and £. 
For many interesting cases, the initial state is a pure state 
Qo — |'I'o)('I'o|- In this case, if |'I'o) can be represented 
as MPS, the construction of the double MPS is trivial. 
On the other hand, if we are not interested in the TADM 
g but in the time average of an operator Oq, a double 
MPS is generally not a suitable choice. For commonly 
used operators las e.g. a Pauli matrix acting on the jth 
sitea-crlt) = (gxa; • crlli (g) the needed 

bond dimension for a double MPS scales exponentially 
with the number of sites. 

Finally, we need to construct the commutator operator 
£, which has formally the appearance of a double MPO, 
where each tensor carries two physical indices. In ap¬ 
pendix ^ we briefly outline the construction of £ for the 
MPO based algorithm, where the commutator operator 
is symbolically written as 




(M2) 


see Eq. (LI). For the double MPS based algorithm, this 
symbolical form can be directly translated into a dou¬ 
ble MPO. That is, £ is the difference of two double 
MPO, where one part of each double MPO represents 
the Hamiltonian and the other part the identity. It is 
easy to verify that for arbitrary double MPS A and B, 
this construction fulfills the property (£H|i?) = (H|£H), 
as it should (181. 


Appendix N: Numerical aspects 

In regard to numerical aspects, the result section fo¬ 
cused strongly on the dependence of the results on the 
bond dimension. Here, we add a few comments concern¬ 
ing convergence properties and numerical precision. 


1. Convergence 

Tensor networks are usually optimized by successive 
local optimizations of one or two tensors at a time. Al¬ 
though it is well known that locally optimizing algo¬ 
rithms often run the risk of getting stuck in a local ex¬ 
tremum, we And e.g. that matrix product state (MPS) 
based ground state search algorithms seem to be widely 
immune against this problem. They exhibits superb con¬ 
vergence properties for many physical systems of interest. 


Can we hope that this is true for the optimization of the 
time averaged density matrix (TADM), as well? 

An important difference between these two algorithms 
is that many commonly used Hamiltonians are sums of 
local operators only, while the squared c omm utator op¬ 
erator £^ used in the TADM algorithm (26) is a highly 
non-local object. If we follow the alternative optimiza¬ 
tion strategy of appendix]^ we even have to use the third 
and fourth power of £. In this case, we occasionally ob¬ 
served strong difficulties in finding the optimal solution 
if we started out with a randomized initial state. In case 
of the standard algorithm based on £^, we noticed con¬ 
vergence into local extrema, as well, but the observed 
deviations were only marginal. 


2. Precision 


A well known source for losses in the numerical preci¬ 
sion are differences of big numbers which differ only by 
a very small number. To soften this effect for the com¬ 
mutator, we recommend to gauge the Hamiltonian such 
that tr(H£io) = 0, i.e 

H ^ H -tT{Hgo). (Nl) 


Still, certain losses in the precision are inevitable. Es¬ 
pecially the T + method (461 discussed in appendix 
is prone to numerical imprecision, since it employs the 
third and fourth ^ower of £. In this algorithm, the value 
of e = ||£papprox|| is minimized and should become zero 
for a perfect gapprox = Q- Now, we have to see that the 
value of £ is not just limited by the achievable numerical 
precision, but during the optimization of e, the average 
improvement per optimization step should also exceed 
the achievable numerical precision. Further, we are actu¬ 
ally interested in the square root of £ respectively in the 


value q = 


IlCgoll 


Especially the combi- 
and T + method seems 


||*^gapp 

nation double MPS (appendix 
to be quite vulnerable. In some of our numerical simula¬ 
tions, the maximal reliable value of q was limited around 
10^ ... 10"^ due to numerical imprecision. An example for 
this effect can be seen in Fig. [^b), where the precision 
of the double MPS T-l- method saturates already for a 
bond dimension D = 128. Of course, we can always re¬ 
sort to a more precise floatingpoint operation, but this is 
usually not supported by the hardware and hence, needs 
a software emulation, which is significantly slower. 
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